以人为中心的大型语言模型(LLM)研究综述 -A Survey on Human-Centric LLMs_第1页
以人为中心的大型语言模型(LLM)研究综述 -A Survey on Human-Centric LLMs_第2页
以人为中心的大型语言模型(LLM)研究综述 -A Survey on Human-Centric LLMs_第3页
以人为中心的大型语言模型(LLM)研究综述 -A Survey on Human-Centric LLMs_第4页
以人为中心的大型语言模型(LLM)研究综述 -A Survey on Human-Centric LLMs_第5页
已阅读5页,还剩71页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

ASurveyonHuman-CentricLLMs

JINGYIWANG*,TsinghuaUniversity,China

arXiv:2411.14491v2[cs.CL]26Nov2024

NICHOLASSUKIENNIK*,TsinghuaUniversity,ChinaTONGLI,TsinghuaUniversity,China

WEIKANGSU,TsinghuaUniversity,China

QIANYUEHAO,TsinghuaUniversity,ChinaJINGBOXU,TsinghuaUniversity,China

ZIHANHUANG,TsinghuaUniversity,ChinaFENGLIXU,TsinghuaUniversity,China

YONGLI,TsinghuaUniversity,China

Therapidevolutionoflargelanguagemodels(LLMs)andtheircapacitytosimulatehumancognitionandbehaviorhasgivenrisetoLLM-basedframeworksandtoolsthatareevaluatedandappliedbasedontheirabilitytoperformtaskstraditionallyperformedbyhumans,namelythoseinvolvingcognition,decision-making,andsocialinteraction.Thissurveyprovidesacomprehensiveexaminationofsuchhuman-centricLLMcapabilities,focusingontheirperformanceinbothindividualtasks(whereanLLMactsasastand-inforasinglehuman)andcollectivetasks(wheremultipleLLMscoordinatetomimicgroupdynamics).WefirstevaluateLLMcompetenciesacrosskeyareasincludingreasoning,perception,andsocialcognition,comparingtheirabilitiestohuman-likeskills.Then,weexplorereal-worldapplicationsofLLMsinhuman-centricdomainssuchasbehavioralscience,politicalscience,andsociology,assessingtheireffectivenessinreplicatinghumanbehaviorsandinteractions.Finally,weidentifychallengesandfutureresearchdirections,suchasimprovingLLMadaptability,emotionalintelligence,andculturalsensitivity,whileaddressinginherentbiasesandenhancingframeworksforhuman-AIcollaboration.ThissurveyaimstoprovideafoundationalunderstandingofLLMsfromahuman-centricperspective,offeringinsightsintotheircurrentcapabilitiesandpotentialforfuturedevelopment.

AdditionalKeyWordsandPhrases:LargeLanguageModels,Human-CenteredComputing.

1INTRODUCTION

Aslargelanguagemodels(LLMs)

[1,

2],suchasOpenAI’sGPTfamily

[3,

4]andMeta’sLLaMA

[5,

6],continuetoevolve,theirabilitytosimulate,analyze,andinfluencehumanbehavioris

growingatanunprecedentedrate.Thesemodelscannowprocessandgeneratehuman-liketextandperformcognitivetasksatlevelscomparabletohumansinmanysituations,providingnewtoolsforunderstandinghumancognition,decision-making,andsocialdynamics.

Assuch,thissurveyaimstoprovideacomprehensiveevaluationofLLMsfromahuman-centricperspective,focusingontheirabilitytosimulate,complement,andenhancehumancognitionandbehavior,bothonanindividualandcollectivelevel.WhileLLMshavetraditionallybeenrootedincomputerscienceandengineering

[7,

8],theirincreasingsophisticationinreplicatinghuman-like

reasoning,decision-making,andsocialinteractionshasexpandedtheiruseintodomainswherehumansarethefocalpoint.Thishasallowedresearcherstoaddressquestionsthatwereoncetoointricateorabstractforcomputationalanalysis.Forexample,inpoliticalscience,LLMsareusedtoanalyzepoliticaldiscourse,detectbiases,andmodelelectionoutcomes

[9];insociology,they

assistinunderstandingsocialmediaconversations,publicsentiment,andgroupbehaviors

[10];

Authors’addresses:JingYiWang*,TsinghuaUniversity,Beijing,China,jy-w22@;NicholasSukiennik*,TsinghuaUniversity,Beijing,China,sukiennikn10@;TongLi,TsinghuaUniversity,Beijing,China,tongli@;WeikangSu,TsinghuaUniversity,Beijing,China;QianyueHao,TsinghuaUniversity,Beijing,China;JingboXu,TsinghuaUniversity,Beijing,China;ZihanHuang,TsinghuaUniversity,Beijing,China;FengliXu,TsinghuaUniversity,Beijing,China;YongLi,TsinghuaUniversity,Beijing,China,liyong07@.

J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

2Wangetal.

andinpsychology,theyhelpmodelhumancognitionanddecision-making

[11]

.LLMshavealsorevolutionizedlinguisticsbyenablinglarge-scaleanalysisoflanguage,fromsyntaxandsemanticstopragmatics

[12],andineconomics,theyallowformodelingcomplexinteractionsbetweenpolicies

andsocietaloutcomes

[13]

.

Tostructurethisinvestigation,thesurveyisdividedintotwomainsections.First,weevaluatehuman-centricLLMs,focusingontheircognitive,perceptual,social,andculturalcompetencies.ThissectionexamineshowLLMsperformtaskscommonlyassociatedwithhumancognition,suchasreasoning,perception,emotionalawareness,andsocialunderstanding.Weassesstheirstrengthsinstructuredreasoning,patternrecognition,andcreativity,whileidentifyingtheirlimitationsinareassuchasreal-timelearning,empathy,andhandlingcomplex,multi-steplogic.BybenchmarkingLLMperformanceagainsthumanstandards,wehighlightareaswhereLLMsexcelandwherefurtherimprovementsareneeded.

Second,weexploreLLMsinhuman-centricapplieddomains,whereLLMsareusedinreal-worldscenariosthattraditionallyrequirehumaninput.Thissectionisdividedintostudiesfocusingonindividualandcollectiveapplications,whereindividual-focusedstudiesinvolveanLLMperformingtaskstypicallydonebyasinglehuman,suchasdecision-making,problem-solving,orcontentcreation,andcollective-focusedstudiesexplorehowmultipleLLMscanworktogethertosimu-lategroupbehaviors,interactions,orcollaborativetasks,offeringinsightsintosocialdynamics,organizationalbehavior,andmulti-agentcoordination.Inbothcontexts,weexaminethemethodsemployedsuchasbasicprompting,multi-agentprompting,andfine-tuning,alongwiththetheoret-icalframeworksthatguidetheseapplications,includinggametheory,sociallearningtheory,andtheoryofmind,etc.

Ultimately,thissurveyseekstoprovideadetailedunderstandingofhowLLMscanbetteralignwithhumanbehaviorsandsocialcontexts,identifyingboththeirstrengthsandareasforimprovement.Figure

1

providesanoverviewofthisframework,categorizingLLMcapabilitiesintoindividualskills,suchascognition,perception,analysis,andexecutivefunctioning,andcollectiveskillslikesocialabilities,andhighlightingtheircapabilitiesinapplyingtostudiesacrossindividualdomainslikebehavioralscience,psychology,andlinguistics,andcollectivedomainsincludingpoliticalscience,economics,andsociology.Inclassifyingresearchworkswiththisframework,weofferinsightsintohowLLMscanbemademoreeffective,ethical,andrealistictoolsforresearchandpracticalapplications,whetherinindividualorcollectivehuman-centricsettings.

Themaincontributionsofthispapercanbesummarizedasfollows.

•Weprovideanin-depthevaluationofLLMcapabilitiesinhuman-centrictasks,focusingontheircognitive,perceptual,andsocialcompetencies,andcomparingtheirperformancetohuman-likereasoning,decision-making,andemotionalunderstanding.

•WeexploreLLM’scapabilitiesinhuman-centricdomains,namelyfocusingonreal-worldapplicationsinindividualandcollectivecontexts,assessingtheirabilitytoreplicatehumanbehaviorsinfieldssuchasbehavioralscience,politicalscience,economics,andsociology,bothassingle-agentmodelsandinmulti-agentsystems.

•Weidentifykeychallengesandfutureresearchdirections,includingimprovingLLMs’real-worldadaptability,emotionalintelligence,andculturalsensitivity,whileaddressingbiasesanddevelopingmoreadvancedframeworksforhuman-AIcollaboration.

Thepaperisorganizedasfollows:Section2providesanoverviewofAI-empoweredhuman-centricstudiesandLLMs,whileSection3evaluatesLLMcompetenciesacrosscognitive,perceptual,analytical,executive,andsocialskills.Section4discusseshowLLMscanbeappliedinavarietyofinterdisciplinaryscenariostobothenhanceLLMdevelopmentandassistinhuman-centeredtasks.

Section5exploresopenchallengesandoutlinesfuturedirectionsforadvancingLLMs.SectionJ.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

ASurveyonHuman-CentricLLMs3

Individual

Collective

Domains

Skills

Cognition

LLM

BehavioralScience

ExecutiveFunction

PoliticalScience

Psychology

Linguistics

Perception

Sociability

Sociology

Economics

Analysis

Fig.1.OurframeworkdepictshowLLMsareevaluatedonfoundationalhuman-likeskills,dividedintoindi-vidual(e.g.,cognition,perception,analysis,executivefunctioning)andcollective(e.g.,sociability)levels,andappliedwithinvariousfieldsofstudysimilarlycategorizedasindividual(e.g.,BehavioralScience,Psychology,Linguistics)andcollective(e.g.,PoliticalScience,Economics,Sociology)domains.

6summarizeskeyinsightsandemphasizestheimportanceofinterdisciplinarycollaborationtoenhanceLLMs’understandingofhumanbehavior.

2OVERVIEW

2.1Human-CentricArtificalIntelligence

2.1.1TraditionalAIApproachesinHuman-CentricStudies.TheapplicationofAIinvarioushuman-centeredfieldshasundergonealongprogression,nowreachingapinnaclewiththeriseofgenerativemodels,withAImethodstobeingusedinvestigatevarioushumanphenomena.however,despitetheirrelativenaivetycomparedtoLLMs,thosetraditionalmethodshavenonethelessenabledresearcherstoaddresscomplexsocialphenomenathroughcomputation.

Foralmostaslongasithasbeeninvestigated,AIhasbeenusedinareasthatarehighlyim-pactfulonsociety

[14]

.SincethenresearchershaveevaluatedthemanywaysinwhichAIcouldemulatehumanbehaviorandthoughtprocession,forexampleincognition

[15],perception

[16],

andexecutivefunction

[17]

.Morerecently,though,withtheriseofthewebandsocialmedia,AI’susescomeclosertoourday-to-daylives.Forexample,inpoliticalcommunicationresearch,thedetectionofpoliticalbiasinnewsarticleshasemergedasacriticalareaofstudy,particularlygiventheincreasingpolarizationinmediaandonlinespaces.Traditionalmethodsforpredictingpoliticalideology,basedonstatisticalmodelingandnetworkanalysis,havebecomeanurgenttaskduetothevastamountofcontentproduceddaily.Forinstance,researchby

[18]employednetwork

analysistoestimateideologicalpreferencesofsocialmediausers.Moreover,techniquesliketopicmodelingandcontentanalysishavebeenwidelyusedtoidentifybiasandmisinformationinnewsarticlesusingdata-miningmethods

[19,

20],highlightingtheuseoftraditionalAItechniquesin

understandingpoliticaldiscourse.Otherworkstackledthetaskofstancedetectionusingmethods

likerecursiveneuralnetworks[21]andclusteringalgorithms[22].Furthermore,Dezfoulietal.

[23]

exploreadversarialvulnerabilitiesindecision-makingmodels,whichiscrucialwhenconsideringJ.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

4Wangetal.

therobustnessoftraditionalbiasdetectionsystemsunderadversarialconditions.Furthermore,Dafoeetal.

[24]emphasizetheimportanceofsystemsdesignedtonavigatesocialenvironments,

suchaspoliticaldiscourse,usingmoreestablishedmulti-agentsystemsandgametheoryframe-works.Meanwhile,machineunderstandingofhumanpreferenceshasalsobeenusedtooptimizethelearningofrewardfunctionsinreinforcementlearning

[25],showingusthatAImethodsnot

onlyhelpusexplainhumanbehavior,butcanbenefitbyunderstandingthem,highlightingtheco-evolutionarynatureofadvancementsinbothAItechniquesandhuman-centricstudies.

Overall,thevastbodyofAI-empoweredhuman-centricstudiespointtotheburgeoningpotentialofusingmoreadvancedcomputationalmethods,suchasLLMs,tobothunderstandandbettersimulatehumanbehaviorandreasoningprocesses.LLMscanpresentnewopportunitiesinthefieldbysimulatinghumanbehaviorsinareaswherereal-worlddataisscarce,aswellasfacilitateinquiryintolawsanddynamicsofhumanbehaviorbasedonLLMreplicability.

2.1.2AParadigmShiftfromTraditionalAItoLLMs.TheriseofLLMshastransformednaturallanguageprocessing(NLP)andartificialintelligenceingeneralthroughkeybreakthroughsinmodelarchitecture,scale,andcapabilities.EarlymodelslikeWord2VecandGloVeusedwordembeddings,buttheintroductionoftheTransformerin2017

[26],withitsself-attentionmechanism,enabled

deepercontextualunderstandingandmarkedaturningpoint.OpenAI’sGPTseries,beginningin2018withGPT

[3],capitalizedonthis,culminatinginGPT-3

[27]andGPT-4

[28],whichdemon

-stratedunprecedentedcapabilitiesinreasoning,textgeneration,andmultimodaltasks.Meanwhile,Google’sPaLM2

[29]advancedmultilingualismandefficiency,andopen-sourcemodelslikeFalcon

[30]andBaidu’sERNIEBot

[31]broadenedaccessandspecialization

.ThesedevelopmentsreflectthegrowingimpactofLLMsacrossdiversedomains,frominterdisciplinaryresearchtoethicalAIapplications.

TherapidadoptionofLLMsacrossacademicdisciplineshasledtovaryingpredictionsaboutwhetherthesesystemswilleventuallymatchhumancognitiveabilities.WhilesomeexpertsforeseeAIachievinghuman-likegeneralintelligenceinthenearfuture,othersremainmorecautious,doubtingwhetherAIcanfullyreplicatethecomplex,abstractreasoningandcreativitythatdefinehumancognition

[32]

.Despitethesedifferingviewpoints,AIisalreadyasignificantforceineverydaylife,influencingdecision-makingandinformationprocessingacrossnumerousdomains.However,akeydistinctionremains:humancognitionisdrivenbyforward-thinking,theory-basedreasoning,whileAIoperatesonpatternsderivedfromvastdatasets,oftenrelyingonprobabilityandpastdata

[33].ThisdifferenceunderscoresthecomplementarynatureofhumanandAIsystems,

witheachexcellingindistinctaspectsofcognitiveprocessing.

Unlikehumanintelligence,LLMsoperatewithoutinherentgoals,values,oremotionalexperi-ences.Humancognition,drivenbysurvival,socialinteraction,andcreativity,isdeeplyconnectedtoourphysicalandsocialenvironments.EvenembodiedAI,whilecapableofinteractingwithitssurroundings,lacksthenuanced,purpose-drivenintelligencethatdefineshumanthought.Incontrast,LLMsgenerateresponsesbasedonprobabilisticmodelsderivedfromlargedatasets,with-outthelivedexperiencesthatinformhumandecision-making.ThoughLLMscansimulatecertainhuman-likebehaviors,theystillfallshortoftheembodiedunderstandinghumanspossess.

ThesedistinctionsraisecriticalquestionsaboutthelimitationsandpotentialsofAI,especiallyasweconsiderthediversecapabilitiesexploredinSection

3,whichdiscussesthecapabilitiesofLLMs

includingcognitive,perceptual,social,analytical,executive,cultural,moral,andcollaborativeskills.Section

4

delvesintohowinterdisciplinaryfields,suchaspoliticalscience,economics,sociology,behavioralscience,psychology,andlinguistics,contributetoLLMdevelopment,offeringinsights

J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

ASurveyonHuman-CentricLLMs5

intohowhumanintelligenceinformsandshapestheevolutionofartificialsystems.Thisexplo-rationemphasizestheimportanceofleveragingLLMstrengthswhilerecognizingthefundamentaldifferencesbetweenhumanandartificialcognition.

3EVALUATIONOFHUMAN-CENTRICLLMS

Toevaluatehuman-centricLLMs,weshowcaseaholisticrepresentationofLLMcompetencies,categorizedintotwodomains:individual(e.g.,cognitive,perceptual,analytical,executivefunc-tioningskills)andcollective(e.g.,socialskills),asshowninFigure

2.

ThisrepresentationincludesvariouskeyLLMskills,suchasreasoning,patternrecognition,spatialawareness,adaptability,decision-making,interpersonalcommunication,andculturalcompetency.Followingthis,Figure

3

outlinestheevaluationapproachesusedtoassessLLMs,includingbenchmarkanddatasettest-ing,human-centricevaluations,interactiveandsimulation-basedevaluations,ethicalandbiasassessments,andlastly,explainabilityandinterpretabilityevaluations.Table

1

highlightsboththestrengthsandareasforimprovementinthesedomains.Byoutliningtheseabilities,weprovideacomprehensivecomparisonofhuman-likeskills,usingbenchmarkstoassesstheirstrengthsandlimitations.Additionally,AppendixTables

2

and

3

provideacomprehensiveoverviewofkeypapers,highlightingtheircontributions,theLLMsassessed,andcomparisonstohumanperformance.Thesubsequentsectiondelvesintoeachcategory,providinganin-depthexplorationoftheskillsandbenchmarksthatdefineLLMperformanceacrossthesedomains.

cuttural

competene"

O

C入

O

Recognition

Pattern

Individual

InformationProcessing

Fig.2.OverviewofLLMCapabilitiesAcrossIndividualandCollectiveDomains.

J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

6Wangetal.

3.1CognitiveSkills

LLMsdemonstratecognitivecompetenciesthatmirrorkeyelementsofhumanintelligence,primarilythroughreasoningandlearning.WhileLLMsshowremarkableabilityinprocessingvastamountsofinformationandgeneratingcoherentresponses,theirproficiencyvarieswhenitcomestocomplexcognitivetasks.Thesemodelsshowcaseevolvedabilitiesinstructuredreasoningandgeneralizationbutencounterchallengeswhenfacedwithintricatelogicorlearningfromreal-timeinteractions.ThissectionexploresthestrengthsandlimitationsofLLMsinreasoningandlearning,highlightingtheirprogressandareasthatrequirefurtheradvancement.

3.1.1Reasoning.Logicalreasoning,acoreelementofhumancognitionandessentialfordailyfunctioning,consistsofvarioustypesofreasoning,includingdeductive,inductive,andcausalrea-soning,eachcontributingtohowweprocessinformationandmakedecisions.Deductivereasoningappliesgeneralprinciplestoobtainspecificconclusions,whileinductivereasoningdrawsgeneral-izationsfromspecificobservations

[34],andcausalreasoninghelpstounderstandcause-and-effect

relationships

[35,

36]

.

SeveralbenchmarkdatasetshavebeendevelopedtoassessthesereasoningcapabilitiesinLLMs.Fordeductivereasoning,theLogiQA2.0dataset

[37]isanotableresource,focusingonfivetypes

ofreasoning,includingcategorical,necessaryconditional,sufficientconditional,conjunctive,anddisjunctivereasoning.PrOntoQA

[38]alsoevaluatesdeductivereasoningthroughfirst-orderlogic

taskswhereLLMsderivespecificconclusionsfromlogicalpremises.Forinductivereasoning,CommonsenseQA2.0

[39]requiresgeneralizationfromeverydayfactsandcommonsenseknowl

-edge,whereastheCreakdataset

[40]furthertestsLLMs’abilitytogeneralizefromcommonsense

knowledgetoidentifyinconsistencies.Inturn,causalreasoningisassessedusingCausalBench

[41],

whichevaluatesLLMs’abilitytoreasonaboutcause-and-effectrelationshipsacrossdiversedo-mains.ContextHub

[42],ontheotherhand,servesasanotherbenchmarkfocusingonLLMs’causal

reasoninginbothabstractandcontextualizedtasks.AdditionaldatasetslikeGSM8K

[43]and

BIG-Bench-Hard

[44]arefurthermoreemployedformathematicalreasoningandevaluatingLLM

performanceacrossvariousreasoningdomains,respectively.

AnalyzingLLMperformancewiththesedatasetshasrevealedsignificantinsightsintotheirreasoningabilitiesandlimitations.Fordeductivereasoning,althoughLLMslikeGPT-3havemadeprogress,theiraccuracyremainsat68.65%intasksinvolvinglogicalinference,whichissignificantlybelowthe90%humanbenchmark

[37]

.Thisgapindicatesongoingchallengesinmasteringcomplexlogicalstructures,especiallywhenmultiplelogicalstepsorintricatereasoningprocessesarerequired.LLMslikeGPT-3.5,PaLM,andLLaMAperformwellonsimplerdeductivereasoningtasksbutstrugglewithmorecomplexscenariosthatinvolvechainingmultiplelogicalpremisestogether

[45]

.Forinductivereasoning,ontheotherhand,GPT-4showsimprovementsinruleapplicationwithupto99.5%partialaccuracy

[46],yetstruggleswithlargerproblemsandminimal

examples.EvenwithChain-of-Thought(CoT)prompting,GPT-4andDavincifacedifficultiesinrulevalidationandintegratingcomplexrules,withDavinci’saccuracydecliningto51%innuancedtasks

[47]

.Inaddition,Hanetal.

[47]evaluateGPT-3.5andGPT-4onpropertyinductiontasks,

highlightingthatwhileGPT-4morecloselyalignswithhumanreasoningpatterns,theystillstruggletofullycapturepremisenon-monotonicity,acriticalelementofhumancognitiveprocessing.

CausalreasoningremainsasignificantchallengeforLLMslikeGPT-4andDavinci,asitrequiresadeepunderstandingofcause-and-effectacrossvariouscontexts.Althoughthesemodelsshowreasonableproficiencyinmathematicalcausaltasks,theCausalBenchbenchmarkhighlightstheirstruggleswithmorecomplextext-basedandcoding-relatedcausalproblems

[41].Interpretingcausal

structuresinnarrativesorcodesnippetsoftengoesbeyondsimpledatacorrelations,demanding

robustreasoningtoavoidproducingmisleadingoutputs.EvenwhenGPT-4initiallyperformswell,J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

ASurveyonHuman-CentricLLMs7

Interactive&

Simulation-BasedEvaluations

.Single-AgentSimulations.Multi-AgentSimulations.Task-OrientedDialogues

Human-CentricEvaluations

.ExpertEvaluations.Crowdsourced

Evaluations

.Human-in-the-Loop

Testing

Ethical&BiasAssessments

.BiasDetection

.FairnessMetrics

.EthicalCompliance

Benchmark&

DatasetTesting

.Standardized

Benchmarks

.CustomBenchmarks.PerformanceMetrics

Explainability&Interpretability

.TransparencyofReasoning

.UserInterpretability

.TechnicalInterpretability

LLM

Evaluations

Fig.3.OverviewofLLMevaluations.

itsreasoningcapabilitiesfrequentlyweakenwhenfacedwithflawedorconflictingarguments,raisingconcernsaboutitsconsistencyincomplexscenarios

[48]

.

TheContextHubbenchmarkisdevelopedtoassessLLMslikeGPT-4,PaLM,andLLaMAinhandlingbothabstractandcontextualizedlogicalproblems

[42]

.ContextHubfocusesonthechallengesthesemodelsencounterwhentransitioningfromsimplelogictaskstonuanced,real-worldreasoning.Whilemodelsperformwellwithstraightforwardproblems,theyoftenstruggletogeneralizeincontext-richscenariosrequiringdeeperinterpretativeskills.AdditionaldatasetslikeGSM8Kemphasizedeductivereasoning,andBIG-Bench-Hardevaluatesmulti-stepreasoning,factualknowledge,andcommonsenseunderstanding

[43,

44]

.Together,thesebenchmarksrevealcriticalinsightsintothestrengthsandlimitationsofmodelslikeGPT-4andDavinci,pinpointingareasthatneedimprovementforhandlingcomplex,real-worldreasoningtasks.

Overall,thesebenchmarkdatasetsprovideacomprehensiveevaluationframeworkforassessingLLMs’reasoningcapabilities,revealingboththeiradvancementsandlimitations.WhileLLMshaveshownprogressinhandlingspecificreasoningtasks,theycontinuetofacesignificantchallengesinmulti-steplogic,contextualproblem-solving,andgeneralizingtheirreasoningabilitiesacrossdiversedomains.

3.1.2Learning.LLMs’learningabilityencompassestheircapacitytoadapt,generalize,andimproveperformancebasedonpre-existingtrainingdataandinteractionswithusersorenvironments.Unliketraditionallearningmodels,LLMsdonotupdatetheirparametersduringinference.Instead,theyrelyonpre-trainedknowledgetoperformfew-shotorzero-shottasks,highlightingtheirgeneralizationcapabilities.However,thiscomeswithsignificantlimitationswhenfacedwithevolving,real-worlddata.

RecenteffortshaveaimedatimprovingLLMadaptabilitythroughvariousstrategies.Forinstance,theRLwithGuidedFeedback(RLGF)framework

[49]optimizeslearningfromfeedback,showing

thatguidedstrategiescansignificantlyimprovetextgenerationindynamicconditions.Similarly,error-drivenlearningapproaches,likeLEMA(LearningfromMistAKes)

[50],allowmodelslike

GPT-4torefinereasoningbyidentifyingandcorrectingerrors.Theseapproacheshighlightthepotentialofleveragingfeedbackanderrorcorrectiontoboostadaptability,yettheystillrelyonstaticdataatinference.

J.ACM,Vol.V,No.N,Article.Publicationdate:November2024.

8Wangetal.

Analysis

Cognition

Perception

Sociability

High

accuracyininformation

retrieval

Structuredmetadata-based

queries

Highvolumeofideasin

structuredtasks

Nuancedemotionalregulation

ExecutiveFunction

Cognition

Real-world,dynamic

challengeadaptation

Entity-basedreasoning

with

structureddatasets

Abstractlogic

reasoninginstructured

contexts

Contextualcue-basedreasoning

Abstractcommon-sense

reasoning

Contextuallogical

reasoning

Contradictorytaskhandling

Multi-step

reasoningwithreal-world

application

Structured,predefinedtask

handling

Context-specificempathy

Complex

Perception

ExecutiveFunction

Dynamicplanning

Real-time

adjustments

Controlledvirtual

environ-

ments

understanding

Socialcontextnavigation

mentalstate

Sociability

Analysis

Basic

empathytasks

Falsebelief

andindirectcue

recognition

Moreoriginal,dive

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论