![智能体实验室:将大语言模型(LLM)智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第1页](http://file4.renrendoc.com/view11/M03/29/22/wKhkGWeLaXmAAzo6AAItjVgwBF4068.jpg)
![智能体实验室:将大语言模型(LLM)智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第2页](http://file4.renrendoc.com/view11/M03/29/22/wKhkGWeLaXmAAzo6AAItjVgwBF40682.jpg)
![智能体实验室:将大语言模型(LLM)智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第3页](http://file4.renrendoc.com/view11/M03/29/22/wKhkGWeLaXmAAzo6AAItjVgwBF40683.jpg)
![智能体实验室:将大语言模型(LLM)智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第4页](http://file4.renrendoc.com/view11/M03/29/22/wKhkGWeLaXmAAzo6AAItjVgwBF40684.jpg)
![智能体实验室:将大语言模型(LLM)智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第5页](http://file4.renrendoc.com/view11/M03/29/22/wKhkGWeLaXmAAzo6AAItjVgwBF40685.jpg)
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
2025-1-9
Correspondingauthor(s):SamuelSchmidgall(sschmi46@)
AgentLaboratory:UsingLLMAgentsasResearchAssistants
SamuelSchmidgall1,2,YushengSu1,ZeWang1,XimengSun1,JialianWu1,XiaodongYu1,JiangLiu1,ZichengLiu1andEmadBarsoum1
1AMD,2JohnsHopkinsUniversity
arXiv:2501.04227v1[cs.HC]8Jan2025
Historically,scientificdiscoveryhasbeenalengthyandcostlyprocess,demandingsubstantialtimeandresourcesfrominitialconceptiontofinalresults.Toacceleratescientificdiscovery,reduceresearchcosts,andimproveresearchquality,weintroduceAgentLaboratory,anautonomousLLM-basedframeworkcapableofcompletingtheentireresearchprocess.Thisframeworkacceptsahuman-providedresearchideaandprogressesthroughthreestages—literaturereview,experimentation,andreportwritingtoproducecomprehensiveresearchoutputs,includingacoderepositoryandaresearchreport,whileenablinguserstoprovidefeedbackandguidanceateachstage.WedeployAgentLaboratorywithvariousstate-of-the-artLLMsandinvitemultipleresearcherstoassessitsqualitybyparticipatinginasurvey,providinghumanfeedbacktoguidetheresearchprocess,andthenevaluatethefinalpaper.Wefoundthat:(1)AgentLaboratorydrivenbyo1-previewgeneratesthebestresearchoutcomes;
(2)Thegeneratedmachinelearningcodeisabletoachievestate-of-the-artperformancecomparedtoexistingmethods;(3)Humaninvolvement,providingfeedbackateachstage,significantlyimprovestheoverallqualityofresearch;(4)AgentLaboratorysignificantlyreducesresearchexpenses,achievingan84%decreasecomparedtopreviousautonomousresearchmethods.WehopeAgentLaboratoryenablesresearcherstoallocatemoreefforttowardcreativeideationratherthanlow-levelcodingandwriting,ultimatelyacceleratingscientificdiscovery.
。https://AgentLaboratory.github.io
Figure1|AgentLaboratorytakesasinputahumanresearchideaandasetofnotes,providesthistoapipelineofspecializedLLM-drivenagents,andproducesaresearchreportandcoderepository.
AgentLaboratory:UsingLLMAgentsasResearchAssistants
2
1.Introduction
Scientistsfrequentlyfaceconstraintsthatlimitthenumberofresearchideastheycanexploreatanygiventime,resultinginideasbeingprioritizedbasedonpredictedimpact.Whilethisprocesshelpsdeterminewhichconceptsareworthinvestingtimeinandhowbesttoallocatelimitedresourceseffectively,manyhighqualityideasremainunexplored.Iftheprocessofexploringideashadlesslimitations,researcherswouldbeabletoinvestigatemultipleconceptssimultaneously,increasingthelikelihoodofscientificdiscovery.
Inanefforttoachievethis,recentworkhasexploredthecapabilityofLLMstoperformresearchideationandautomatedpapergeneration,whereLLMagentsperformtheroleofhumanscientists
(Baeketal.
(2024
);
Ghafarollahi&Buehler
(2024b
);
Luetal.
(2024a
);
Swansonetal.
(2024
)).Theworkof
Baeketal.
(2024)introducesResearchAgent,whichautomaticallygeneratesresearch
ideas,methods,andexperimentdesigns,iterativelyrefiningthemthroughfeedbackfrommultiplereviewingagentsthatmirrorpeerdiscussionsandleveragehuman-alignedevaluationcriteriatoimprovetheoutputs.
Luetal.
(2024a)exploresfullyautomatedpapergeneration,whereTheAI
Scientistframeworkgeneratesnovelresearchideas,writescode,conductsexperiments,andcreatesafullscientificpaperwithanautomatedpeer-reviewsystemtoevaluatethework.EventhoughtheseworksdemonstratethatcurrentLLMscangenerateideasjudgedtobemorenovelthanthoseproducedbyhumanexperts,
Sietal.
(2024)indicatesthatLLMsstillexhibitweaknessesinfeasibility
andimplementationdetails,suggestingacomplementaryratherthanreplacementroleforLLMsinresearch.Therefore,weaimtodesignanautonomousagentpipelinethatcanassisthumanstowardimplementingtheirownresearchideas.
Inthiswork,weintroduceAgentLaboratory,anautonomouspipelineforacceleratingtheindividual’sabilitytoperformmachinelearningresearch.Unlikepreviousapproaches,whereagents
participateintheirownresearchideationindependentofhumaninput(Baeketal.
(2024
);
Luetal.
(2024b)),AgentLaboratory
isdesignedtoassisthumanscientistsinexecutingtheirownresearchideasusinglanguageagents.AgentLaboratorytakesasinputahumanresearchideaandoutputsaresearchreportandcoderepositoryproducedbyautonomouslanguageagents,allowingvariouslevelsofhumaninvolvement,wherefeedbackcanbeprovidedatafrequencybasedonuserpreference.Adetailedlistofourcontributionsareprovidedbelow:
1.WeintroduceAgentLaboratory,anopen-sourceLLMagentframeworkforacceleratingtheindividual’sabilitytoperformresearchinmachinelearning.Inordertoaccommodateallusers,AgentLaboratoryiscomputeflexible,wherevariouslevelsofcomputecanbeallocatedbasedontheindividual’saccesstocomputeresource(e.g.,CPU,GPU,memory)andmodelinferencebudget.
2.HumanevaluatorsratedpapersgeneratedusingAgentLaboratoryacrossexperimentalquality,reportquality,andusefulness,showingthatwhiletheo1-previewbackendwasperceivedasthemostuseful,o1-miniachievedthehighestexperimentalqualityscores,andgpt-4owasbehindinallmetrics.
3.NeurIPS-styleevaluationsshowedthato1-previewperformedbestamongbackends,particularlyinclarityandsoundness,accordingtohumanreviewers.However,acleargapemergedbetweenhumanandautomatedevaluations,withautomatedscoressignificantlyoverestimatingquality(6.1/10vs.3.8/10overall).Similardiscrepancieswereseenacrossclarityandcontributionmetrics,suggestingtheneedforhumanfeedbacktocomplementautomatedevaluationsformoreaccurateassessmentsofresearchquality.
4.Co-pilotmodeinAgentLaboratorywasevaluatedoncustomandpreselectedtopics,showinghigheroverallscorescomparedtoautonomousmode.Co-pilotpapersalsosawtrade-offs
AgentLaboratory:UsingLLMAgentsasResearchAssistants
3
inexperimentalqualityandusefulness,reflectingchallengesinaligningagentoutputswithresearcherintent.
5.Theco-pilotfeatureinAgentLaboratoryisoverallfoundtohavehighutilityandusabilitywhenratedbyhumanusers,withmostparticipantsdecidingtocontinueusageaftertheirexperience
6.Detailedcostandinferencetimestatistics,aswellasthebreakdownofcostperpaperphase,arepresentedfordifferentmodelback-ends,demonstratingthatAgentLaboratoryoffersautomaticresearchatagreatlyreducedpricecomparedwithotherworks(only$2.33USDperpaperwithagpt-4obackend).
7.State-of-the-artperformanceonasubsetofMLE-Benchchallengesusingtheproposedmle-solver,achievinghigherconsistencyandscoringcomparedtoothersolvers,andearningmoremedals,
includinggoldandsilver,thanMLAB,OpenHands,andAIDE.
Wehopethatthisworktakesasteptowardacceleratingscientificdiscoveryinmachinelearning,allowingresearcherstoallocatemoreefforttowardcreativeideationandexperimentdesignratherthanlow-levelcodingandwriting.
2.Background&RelatedWork
LargelanguagemodelsTheresearchagentsinthispaperarebuiltonautoregressivelargelanguage
models(LLMs),whicharetrainedonextensivetextcorporatopredictconditionalprobabilitiesoftoken
sequences,p(xt|x<t;θ),andgeneratetextcompletionsthroughsampling,wherext∼softmax(W·ht),
withhtasthehiddenstateandWasthelearnedweightmatrixmappingtotokenprobabilities.LLMs
utilizetransformerarchitectures(Vaswani
(2017))tocapturelong-rangedependenciesintext
.These
models,suchasClaude(
Anthropic
(2024)),Llama(
Dubeyetal.
(2024
);
Touvronetal.
(2023a
,b)),
andChatGPT(Achiametal.
(2023
);
Hurstetal.
(2024
);
OpenAI
(2022)),leveragevastdatasets
andscalingtechniques,thusenablingthemtoperformawidearrayoflanguage-basedtasks,suchas
translation,summarization,andreasoning,bygeneralizingpatternslearnedduringpretrainingto
novelinputs
Brown
(2020
).
LLMAgentsWhileLLMsdemonstratestrongunderstandingandreasoningabilities,theyfacechal-lengeswhenexecutingtasksinreal-worldscenarios.Toovercometheselimitations,theircapabilitiesareextendedthroughstructuredframeworks,enablingthemtoautonomouslyandsemi-autonomously
performtaskexecutionandsemi-autonomouslyperformtaskexecution(Chenetal.
(2023b
);
Li
etal.
(2023
);
Qianetal.
(2024
);
Wuetal.
(2023
)).Thesesystems,referredtoasagents,utilize
techniquessuchaschain-of-thoughtprompting(Weietal.
(2022)
),iterativerefinement(Shinnetal.
(2024)),self-improvement(
Huangetal.
(2022)),andexternaltoolintegrationtoexecutecomplex
workflows(Haoetal.
(2024
);
Qinetal.
(2023
);
Schicketal.
(2023
)).LLMagentshavemaderemarkableprogressinsolvingtasksofreal-worldsignificance,suchassoftwareengineering
Jimenez
etal.
(2023
);
Wangetal.
(2024b
);
Yangetal.
(2024)),cybersecurity(Abramovichetal.
(2024
);
Fangetal.
(2024
);
Wanetal.
(2024)),andmedicaldiagnosis(McDuffetal.
(2023
);
Schmidgall
etal.
(2024
);
Tuetal.
(2024
)).TherehasalsobeenprogressinapplyingLLMsagentstoembodied
problemssuchasautonomousrobotics(Blacketal.
(2024
);
Brohanetal.
(2022,
2023);
Kimetal.
(2024)),webtasks(
Dengetal.
(2024
);
Guretal.
(2023
);
Heetal.
(2024
);
Puttaetal.
(2024
);
Shi
etal.
(2017)),andgameplaying(ALetal.
(2024
);
Fengetal.
(2024
);
Wangetal.
(2023
)).ForabroaderoverviewofLLMagents,referto
Wangetal.
(2024a
).
AgentLaboratory:UsingLLMAgentsasResearchAssistants
4
AutomatedmachinelearningAutomatedmachinelearningisanareaofactiveresearch,withmanyapproachesfocusedonusingKaggle,anonlineplatformformachinelearningcompetitions,asabenchmarkforevaluatingagentperformance.NotableeffortsincludeMLE-
Bench(Chanetal.
(2024)),DS-bench(
Jingetal.
(2024)),andMLAgentBench(
Huangetal.
(2024))whichpropose
using75,74,and6KagglechallengesrespectivelyasbenchmarkstomeasuretheabilitiesofMLagentsintaskssuchasdatapreparation,modeldevelopment,andsubmission.SeveralML"solvers"whichcansolveMLchallengeshavebeenintroduced,suchasAIDE(
Schmidtetal.
(2024)),CodeActAgent
(referredtoas“OpenHands")(
Wangetal.
(2024b)),andResearchAgent(referredtoas“MLAB")
fromMLAgentBench(
Huangetal.
(2024))whichautomatefeatureimplementation,bugfixing,and
coderefactoringwithahighsuccessrate.AgentK(
Grosnitetal.
(2024))demonstratestheabilityto
solveKagglechallengesatthehuman-levelwithachallengeURLprovidedasinput.
AIinScientificDiscoveryAIhasbeenusedtosupportscientificdiscoveryacrossnumerousdisci-plinesfordecades.Forinstance,AIhasbeenusedfordiscoveryinmathematics(
Romera-Paredes
etal.
(2024
)
),materialscience(Merchantetal.
(2023
);
Pyzer-Knappetal.
(2022
);
Szymanskietal.
(2023)),chemistry(
Hayesetal.
(2024
);
Jumperetal.
(2021)),algorithmdiscovery(Fawzietal.
(2022)),andcomputationalbiology(
Dingetal.
(2024
)).TheseapproachespositionAIasatoolratherthananagentperformingresearchinautonomousresearch.
LLMsforresearchrelatedtasksLLMshavedemonstratedstrongcapabilitiesindiverseresearch-
relatedtasks,suchascodegeneration(Chenetal.
(2021
);
Nijkampetal.
(2022
)),end-to-endsoftware
development(Haietal.
(2024
);
Phanetal.
(2024
);
Qianetal.
(2023,
2024)),codegenerationfor
discovery(Chenetal.
(2024b
);
Ghafarollahi&Buehler
(2024a
);
Guetal.
(2024
);
Guoetal.
(2024
);
Huetal.
(2024b
);
Ifarganetal.
(2024
);
Majumderetal.
(2024)),researchquestion-answering
(Chenetal.
(2024a
);
Lálaetal.
(2023
);
Linetal.
(2024
);
Songetal.
(2024
)),researchideation
(Baeketal.
(2024
);
Ghafarollahi&Buehler
(2024b
);
Lietal.
(2024a
);
Sietal.
(2024
)),automatedpaperreviewing(
D’Arcyetal.
(2024
);
Liangetal.
(2024
);
Luetal.
(2024b
);
Wengetal.
(2024
)),literaturesearch(
Ajithetal.
(2024
);
Kang&Xiong
(2024
);
Lietal.
(2024b
);
Pressetal.
(2024
)),
andpredictingtheoutcomeofexperiments(Ashokkumaretal.
(2024
);
Lehretal.
(2024
);
Luoetal.
(2024
);
Manningetal.
(2024
);
Zhangetal.
(2024
)).AlthoughLLMshavemadenotableprogressinsolvingtheaforementionedtasks,ideationhasstruggledtoprogress,withsomeworkshowingthat
LLMideationleadstogreaternoveltythanhumans(Sietal.
(2024
)),whileothersshowreducedcreativity(
Chakrabartyetal.
(2024))andgreaterhomogeneouseffects(Andersonetal.
(2024
);
Zhouetal.
(2024))thatmaylimitcreativediscoverywithouthumanguidance
.
Additionally,researchonhuman-AIcollaborationhasreachedmixedconclusionsabouttheidea
novelty(Ashkinazeetal.
(2024
);
Liuetal.
(2024
);
Padmakumar&He
(2024
)).Thesefindingssuggestthat,withthecurrentLLMs,thestrongestresearchsystemswouldcombinehuman-guidedideationwithLLM-basedworkflows.
LLMsforautonomousresearchRecentadvancementsinautomatedscientificworkflowshavefocusedonleveragingLLMstoemulatetheprocessofresearch.
Swansonetal.
(2024
)introducesateamofLLMagentsworkingasscientistsalongsideahumanresearcherwhoprovideshigh-levelfeedback,withtheendresultbeingnovelnanobodybindersaimedataddressingrecentvariantsofSARS-CoV-2.
ChemCrow(M.Branetal.
(2024
)
)andCoscientist(Boikoetal.
(2023
))demonstratetheabilityforautonomousideationandexperimentationinchemistry.
ResearchAgent(Baeketal.
(2024
))automatesresearchideageneration,experimentdesign,anditerativerefinementusingfeedbackfromreviewingagentsalignedwithhumanevaluationcriterion.
TheAIScientist(Luetal.
(2024a
))extends
AgentLaboratory:UsingLLMAgentsasResearchAssistants
5
Figure2|AgentLaboratoryWorkflow.ThisimageillustratesthethreeprimaryphasesofAgentLaboratory:LiteratureReview,Experimentation,andReportWriting,eachfeaturingdistincttasks,tools,andhuman-agentroles.ThepipelineintegrateshumaninputwithLLM-drivenagents,suchasthePhDandPostdocagents,whichhandleliteraturereviews,experimentalplanning,datapreparation,andresultinterpretation.Specializedtoolslikemle-solverforexperimentationandpaper-solverforreportgenerationautomatetediousresearchtasks,enablingcollaborationbetweenhumanresearchersandAItoproducehigh-qualityresearchoutputs.
thisautomationtoencompassend-to-endscientificdiscovery,includingcoding,experimentexecution,andautomatedpeerreviewformanuscriptgeneration.Despitetheseadvancements,studieslike
Sietal.
(2024)highlightlimitationsinthefeasibilityandimplementationdetailsofLLMideation,
indicatingacomplementaryratherthanreplacementroleforLLMsinautonomousresearch.
3.AgentLaboratory
Overview.AgentLaboratorybeginswiththeindependentcollectionandanalysisofrelevantresearchpapers,progressesthroughcollaborativeplanninganddatapreparation,andresultsinautomatedexperimentationandcomprehensivereportgeneration.AsshowninFigure
2,
theoverallworkflowconsistsofthreeprimaryphases:(1)LiteratureReview,(2)Experimentation,and(3)ReportWriting.Inthissection,wewillintroducethesephasesindetailalongwiththecorrespondinginvolvedagents.Furthermore,inSection
4
,wewillconductqualitativeandquantitativeanalysestodemonstratethestrengthsofAgentLaboratoryanditsabilitytogenerate
3.1.LiteratureReview
LiteratureReview.Theliteraturereviewphaseinvolvesgatheringandcuratingrelevantresearchpapersforthegivenresearchideatoprovidereferencesforsubsequentstages.Duringthisprocess,thePhDagentutilizesthearXivAPItoretrieverelatedpapersandperformsthreemainactions:summary,fulltext,andaddpaper.Thesummaryactionretrievesabstractsofthetop20papersrelevanttotheinitialqueryproducedbytheagent.Thefulltextactionextractsthecompletecontentofspecificpapers,andtheaddpaperactionincorporatesselectedsummariesorfulltextsintothecuratedreview.Thisprocessisiterativeratherthanasingle-stepoperation,astheagentperformsmultiplequeries,evaluatestherelevanceofeachpaperbasedonitscontent,andrefinesthe
AgentLaboratory:UsingLLMAgentsasResearchAssistants
6
selectiontobuildacomprehensivereview.Oncethespecifiednumberofrelevanttexts(N=max)isreachedviatheaddpapercommand,thecuratedreviewisfinalizedforuseinsubsequentphases.
3.2.Experimentation
PlanFormulationTheplanformulationphasefocusesoncreatingadetailed,actionableresearchplanbasedontheliteraturereviewandresearchgoal.Duringthisphase,thePhDandPostdocagentscollaboratethroughdialoguetospecifyhowtoachievetheresearchobjective,detailingexperimentalcomponentsneededtocompletethespecifiedresearchideasuchaswhichmachinelearningmodelstoimplement,whichdatasetstouse,andthehigh-levelstepsoftheexperiment.Onceaconsensusisreached,thePostdocagentsubmitsthisplanusingtheplancommand,whichservesasasetofinstructionsforsubsequentsubtasks.
DataPreparation.Thegoalofthedatapreparationphaseistowritecodethatpreparesdataforrunningexperiments,usingtheinstructionsfromtheplanformulationstageasaguideline.TheMLEngineeragentexecutescodeusingPythoncommandcommandandobservesanyprintedoutput.TheMLEngineerhasaccesstoHuggingFacedatasets,searchableviathesearchHFcommand.Afteragreeingonthefinalizeddatapreparationcode,theSWEngineeragentsubmitsitusingthesubmitcodecommand.Beforethefinalsubmissionproceeds,thecodeisfirstpassedthroughaPythoncompilertoensurethattherearenocompilationissues.Thisprocesswillbeiterativelyexecuteduntilthecodeisbug-free.
RunningExperiments.Intherunningexperimentsphase,theMLEngineeragentfocusesonimple-mentingandexecutingtheexperimentalplanformulatedprior.Thisisfacilitatedbymle-solver,aspecializedmoduledesignedtogenerate,test,andrefinemachinelearningcodeautonomously.mle-solverbeginsbyproducinginitialcodebasedontheresearchplanandinsightsfromtheliteraturereview.Forthefirstmle-solverstep,theprogramisemptyandmustgenerateafilefromscratch,whichisusedasthetopscoringprogram.Thefollowingprocessesdescribetheworkflowofthemle-solver:
A.CommandExecution.Duringthecommandexecutionphase,aninitialprogramissampledfromamaintainedsetoftop-performingprograms,whichisrepresentedbyasinglefiledur-inginitialization.Themle-solveriterativelyrefinesthisprogramthroughtwooperations,REPLACEandEDIT,tobetteraligntheoutputwithexperimentalobjectives.TheEDITopera-tionidentifiesarangeoflines,substitutingthecodebetweenthespecifiedlinenumberswithnewlygeneratedcode.Incontrast,theREPLACEoperationgeneratesacompletelynewPythonfile.
B.CodeExecution.Afteracodecommandisexecuted,thenewprogramispassedthroughacompilertocheckforruntimeerrors.Ifitsuccessfullycompiles,ascoreisreturnedandthelistoftopprogramsisupdatedifthescoreishigherthantheexistingprograms.Ifthecodedoesnotcompile,theagentattemptstorepairthecodeforNreptries(Nrep=3inourexperiments)beforereturninganerrorandmovingontoanewcodereplacement.
C.ProgramScoring.Ifacodesucceedsincompilation,itissenttoascoringfunctionwhichdeterminesifitisbetterthanpreviouslyimplementedexperimentcode.Inordertoobtainaprogramscore,weimplementascoringfunctionthatusesanLLMrewardmodeltoassesstheeffectivenessoftheMLcodegeneratedbymle-solver.Therewardmodel,invokedasanLM,scorestheprogramonascalefrom0to1consideringtheoutlinedresearchplan,theproducedcode,andtheobservedoutputtodeterminehowaccuratelytheprogramadheresto
AgentLaboratory:UsingLLMAgentsasResearchAssistants
7
Figure3|Overviewofthemle-solverworkflow.ThisdiagramdetailstheiterativeprocessusedbytheMLE-Solvertoautonomouslygeneratemachinelearningcode.Beginningwithexternalresources,theworkflowintegratescommandexecution(A),wherenewcodeisgenerated,followedbycodeexecution(B)tocompileandrepairissuesifneeded.Programscoring(C)evaluatesthegeneratedcodeusingarewardfunction,whileself-reflection(D)helpsrefinefutureiterationsbasedonresults.Performancestabilization(E)ensuresconsistentoutcomesbymaintainingapooloftop-performingprogramsanditerativeoptimization.
theinitialgoals.Ascoreof1isprovidedforresultswithhighalignmentandeverythingbelowonaspectrumofhowcloselytheoutputandcodematchestheplanninggoals.Thisprocessis
similartoexistingmethodsforLLMreasoningtreesearch(Yaoetal.
(2024
)),whereinsteadofaseriesofreasoningstepsbeingtraversedusingself-evaluatedLLMscoring,thesetofpossibleprogramsarebeingtraversed(viaEDITandREPLACEcommands)andtheresultingprogramoutcomeisself-evaluatedtodetermineifaprogramisworthbuildingon.ThisissimilartotheSolutionSpaceSearchofAIDE(
Schmidtetal.
(2024)),howevertheirmethodwasspecifically
designedfortheKagglecompetitionsandissimplyextractingtheaccuracyratherthanscoringtheresearchcodeandoutcomes.
D.SelfReflection.Whetherthecodesucceedsorfails,aself-reflectionisproducedbasedon
theexperimentalresultsortheencounterederrorsignal(Renze&Guven
(2024
);
Shinnetal.
(2024
)).Here,themle-solverispromptedtoreflectontheoutcomeofitsactions.Iftheprogramfailedtocompile,thesolverreflectsonhowtofixthisissueinnextiterations.Ifitsuccessfulycompilesandreturnsascore,thesolverwillreflectonhowtoincreasethisscore.Thesereflectionsaregeneratedtoimprovefutureperformance,ensuringthatthesystemlearnsfromerrors,improvingthequalityandrobustnessofthegeneratedcodeoveriterativecycles.
E.PerformanceStabilizationTopreventperformancedrift,twomechanismsareimplemented:topprogramsamplingandbatch-parallelization.Intopprogramsampling,acollectionofthehighest-scoringprogramsismaintained,andoneprogramisrandomlysampledbeforeexecutingacommand,ensuringdiversitywhileretainingquality.Forbatch-parallelization,eachsolverstepinvolvesmakingNmodificationssimultaneously,withthetopmodificationselectedtoreplacethelowest-scoringprograminthetopcollection.Thesestrategiesusehigh-entropysamplingtomodifythecode,resultinginabalancebetweenexplorationofnewsolutionsand
AgentLaboratory:UsingLLMAgentsasResearchAssistants
8
Figure4|Graphicaloutlineofpaper-solver.Thisdiagramshowcasesthestep-by-stepprocessofgeneratingandrefiningacademicresearchreportsusingthePaper-Solvertool.Theworkflowstartswiththecreationofaninitialreportscaffold(A)byiterativelygeneratingLaTeX-basedsections,followedbyupdatestoensurestructuralcompleteness.(B)ResearchisperformedthroughanArxivtoolduringrelevantsections.IntheReportEditingphase(C),thelanguagemodelappliestargetededitstoimprovethedocument,withLaTeXcompilationverifyingtheintegrityofchanges.Finally,thecompletedreportundergoesareward-basedevaluationduringthePaperReviewphase(D),ensuringalignmentwithacademicstandardsandresearchgoals.
refinementofexistingonesinordertomaintainstablecodemodifications.
ResultsInterpretation.Thegoaloftheresultsinterpretationphaseistoderivemeaningfulinsightsfromexperimentaloutcomestoinformthefinalreport.ThePhDandPostdocagentsdiscusstheirun-derstandingoftheexperimentalresultsproducedbymle-solver.Oncetheyagreeonameaningfulinterpretationthatcouldcontributetoacompellingacademicpaper,thePostdocagentsubmitsitusingtheinterpretationcommand,formingthebasisforthereportwritingphase.
3.3.ReportWriting
ReportWriting.Inthereportwritingphase,thePhDandProfessoragentsynthesizetheresearchfindingsintoacomprehensiveacademicreport.Thisprocessisfacilitatedbyaspecializedmodulecalledpaper-solver,whichiterativelygeneratesandrefinesthereport.Thepaper-solveraimstoactasareportgenerator,positioningtheworkthathasbeenproducedbypreviousstagesofAgentLaboratory.paper-solverdoesnotaimtoentirelyreplacetheacademicpaper-writingprocess,butrathertosummarizetheresearchthathasbeenproducedinahuman-readableformatsothattheresearcherusingAgentLaboratoryunderstandswhathasbeenaccomplished.Theoutputfollowsthestandardstructureofanacademicpaper,ensuringitmeetsconferencesubmissionrequirements(forthepaperscoringphase)whilebeingclearandmethodical.Thefollowingprocessesdescribetheworkflowofpaper-solver:
A.InitialReportScaffold.Thefirsttaskofthepaper-solveristogenerateaninitialscaffoldfortheresearchpaper.Thisscaffoldoutlinesthedocumentstructure,dividingitintoeightstan-dardizedsections:Abstract,Introduction,Background,RelatedWork,Methods,ExperimentalSetup,Results,andDiscussion.D
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- racemic-6-7-Epoxy-cannabichromene-生命科学试剂-MCE-6900
- Gluconapin-生命科学试剂-MCE-5096
- 25B-NB3OMe-hydrochloride-生命科学试剂-MCE-6391
- 施工日志填写样本外墙装饰工程
- 跨代沟通与家庭关系中的文化融合
- DB15T 3843-2025新能源分布式电源并网技术规范
- 云计算建设项目服务合同
- 事业单位与员工停薪留职合同范本
- 个人车位交易合同范例
- 个人企业房屋租赁合同模板
- 苏州2025年江苏苏州太仓市高新区(科教新城娄东街道陆渡街道)招聘司法协理员(编外用工)10人笔试历年参考题库附带答案详解
- 搞笑小品剧本《大城小事》台词完整版
- 物业服务和后勤运输保障服务总体服务方案
- 2025年北京市文化和旅游局系统事业单位招聘101人笔试高频重点提升(共500题)附带答案详解
- 中学学校2024-2025学年第二学期教学工作计划
- 人大代表小组活动计划人大代表活动方案
- 《大模型原理与技术》全套教学课件
- 2023年护理人员分层培训、考核计划表
- 《销售培训实例》课件
- 2025年四川省新高考八省适应性联考模拟演练(二)地理试卷(含答案详解)
- 【经典文献】《矛盾论》全文
评论
0/150
提交评论