智能体实验室：将大语言模型（LLM）智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants

上传人：策*** IP属地：山西上传时间：2025-01-19 格式：DOCX 页数：93 大小：1.12MB 积分：19.9 举报 版权申诉

智能体实验室：将大语言模型（LLM）智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第2页

智能体实验室：将大语言模型（LLM）智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第3页

智能体实验室：将大语言模型（LLM）智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第4页

智能体实验室：将大语言模型（LLM）智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants_第5页

已阅读5页，还剩88页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

2025-1-9

Correspondingauthor(s):SamuelSchmidgall(sschmi46@)

AgentLaboratory:UsingLLMAgentsasResearchAssistants

SamuelSchmidgall1,2,YushengSu1,ZeWang1,XimengSun1,JialianWu1,XiaodongYu1,JiangLiu1,ZichengLiu1andEmadBarsoum1

1AMD,2JohnsHopkinsUniversity

arXiv:2501.04227v1[cs.HC]8Jan2025

Historically,scientificdiscoveryhasbeenalengthyandcostlyprocess,demandingsubstantialtimeandresourcesfrominitialconceptiontofinalresults.Toacceleratescientificdiscovery,reduceresearchcosts,andimproveresearchquality,weintroduceAgentLaboratory,anautonomousLLM-basedframeworkcapableofcompletingtheentireresearchprocess.Thisframeworkacceptsahuman-providedresearchideaandprogressesthroughthreestages—literaturereview,experimentation,andreportwritingtoproducecomprehensiveresearchoutputs,includingacoderepositoryandaresearchreport,whileenablinguserstoprovidefeedbackandguidanceateachstage.WedeployAgentLaboratorywithvariousstate-of-the-artLLMsandinvitemultipleresearcherstoassessitsqualitybyparticipatinginasurvey,providinghumanfeedbacktoguidetheresearchprocess,andthenevaluatethefinalpaper.Wefoundthat:(1)AgentLaboratorydrivenbyo1-previewgeneratesthebestresearchoutcomes;

(2)Thegeneratedmachinelearningcodeisabletoachievestate-of-the-artperformancecomparedtoexistingmethods;(3)Humaninvolvement,providingfeedbackateachstage,significantlyimprovestheoverallqualityofresearch;(4)AgentLaboratorysignificantlyreducesresearchexpenses,achievingan84%decreasecomparedtopreviousautonomousresearchmethods.WehopeAgentLaboratoryenablesresearcherstoallocatemoreefforttowardcreativeideationratherthanlow-levelcodingandwriting,ultimatelyacceleratingscientificdiscovery.

。https://AgentLaboratory.github.io

Figure1|AgentLaboratorytakesasinputahumanresearchideaandasetofnotes,providesthistoapipelineofspecializedLLM-drivenagents,andproducesaresearchreportandcoderepository.

AgentLaboratory:UsingLLMAgentsasResearchAssistants

1.Introduction

Scientistsfrequentlyfaceconstraintsthatlimitthenumberofresearchideastheycanexploreatanygiventime,resultinginideasbeingprioritizedbasedonpredictedimpact.Whilethisprocesshelpsdeterminewhichconceptsareworthinvestingtimeinandhowbesttoallocatelimitedresourceseffectively,manyhighqualityideasremainunexplored.Iftheprocessofexploringideashadlesslimitations,researcherswouldbeabletoinvestigatemultipleconceptssimultaneously,increasingthelikelihoodofscientificdiscovery.

Inanefforttoachievethis,recentworkhasexploredthecapabilityofLLMstoperformresearchideationandautomatedpapergeneration,whereLLMagentsperformtheroleofhumanscientists

(Baeketal.

(2024

);

Ghafarollahi&Buehler

(2024b

);

Luetal.

(2024a

);

Swansonetal.

(2024

)).Theworkof

Baeketal.

(2024)introducesResearchAgent,whichautomaticallygeneratesresearch

ideas,methods,andexperimentdesigns,iterativelyrefiningthemthroughfeedbackfrommultiplereviewingagentsthatmirrorpeerdiscussionsandleveragehuman-alignedevaluationcriteriatoimprovetheoutputs.

Luetal.

(2024a)exploresfullyautomatedpapergeneration,whereTheAI

Scientistframeworkgeneratesnovelresearchideas,writescode,conductsexperiments,andcreatesafullscientificpaperwithanautomatedpeer-reviewsystemtoevaluatethework.EventhoughtheseworksdemonstratethatcurrentLLMscangenerateideasjudgedtobemorenovelthanthoseproducedbyhumanexperts,

Sietal.

(2024)indicatesthatLLMsstillexhibitweaknessesinfeasibility

andimplementationdetails,suggestingacomplementaryratherthanreplacementroleforLLMsinresearch.Therefore,weaimtodesignanautonomousagentpipelinethatcanassisthumanstowardimplementingtheirownresearchideas.

Inthiswork,weintroduceAgentLaboratory,anautonomouspipelineforacceleratingtheindividual’sabilitytoperformmachinelearningresearch.Unlikepreviousapproaches,whereagents

participateintheirownresearchideationindependentofhumaninput(Baeketal.

(2024

);

Luetal.

(2024b)),AgentLaboratory

isdesignedtoassisthumanscientistsinexecutingtheirownresearchideasusinglanguageagents.AgentLaboratorytakesasinputahumanresearchideaandoutputsaresearchreportandcoderepositoryproducedbyautonomouslanguageagents,allowingvariouslevelsofhumaninvolvement,wherefeedbackcanbeprovidedatafrequencybasedonuserpreference.Adetailedlistofourcontributionsareprovidedbelow:

1.WeintroduceAgentLaboratory,anopen-sourceLLMagentframeworkforacceleratingtheindividual’sabilitytoperformresearchinmachinelearning.Inordertoaccommodateallusers,AgentLaboratoryiscomputeflexible,wherevariouslevelsofcomputecanbeallocatedbasedontheindividual’saccesstocomputeresource(e.g.,CPU,GPU,memory)andmodelinferencebudget.

2.HumanevaluatorsratedpapersgeneratedusingAgentLaboratoryacrossexperimentalquality,reportquality,andusefulness,showingthatwhiletheo1-previewbackendwasperceivedasthemostuseful,o1-miniachievedthehighestexperimentalqualityscores,andgpt-4owasbehindinallmetrics.

3.NeurIPS-styleevaluationsshowedthato1-previewperformedbestamongbackends,particularlyinclarityandsoundness,accordingtohumanreviewers.However,acleargapemergedbetweenhumanandautomatedevaluations,withautomatedscoressignificantlyoverestimatingquality(6.1/10vs.3.8/10overall).Similardiscrepancieswereseenacrossclarityandcontributionmetrics,suggestingtheneedforhumanfeedbacktocomplementautomatedevaluationsformoreaccurateassessmentsofresearchquality.

4.Co-pilotmodeinAgentLaboratorywasevaluatedoncustomandpreselectedtopics,showinghigheroverallscorescomparedtoautonomousmode.Co-pilotpapersalsosawtrade-offs

AgentLaboratory:UsingLLMAgentsasResearchAssistants

inexperimentalqualityandusefulness,reflectingchallengesinaligningagentoutputswithresearcherintent.

5.Theco-pilotfeatureinAgentLaboratoryisoverallfoundtohavehighutilityandusabilitywhenratedbyhumanusers,withmostparticipantsdecidingtocontinueusageaftertheirexperience

6.Detailedcostandinferencetimestatistics,aswellasthebreakdownofcostperpaperphase,arepresentedfordifferentmodelback-ends,demonstratingthatAgentLaboratoryoffersautomaticresearchatagreatlyreducedpricecomparedwithotherworks(only$2.33USDperpaperwithagpt-4obackend).

7.State-of-the-artperformanceonasubsetofMLE-Benchchallengesusingtheproposedmle-solver,achievinghigherconsistencyandscoringcomparedtoothersolvers,andearningmoremedals,

includinggoldandsilver,thanMLAB,OpenHands,andAIDE.

Wehopethatthisworktakesasteptowardacceleratingscientificdiscoveryinmachinelearning,allowingresearcherstoallocatemoreefforttowardcreativeideationandexperimentdesignratherthanlow-levelcodingandwriting.

2.Background&RelatedWork

LargelanguagemodelsTheresearchagentsinthispaperarebuiltonautoregressivelargelanguage

models(LLMs),whicharetrainedonextensivetextcorporatopredictconditionalprobabilitiesoftoken

sequences,p(xt|x<t;θ),andgeneratetextcompletionsthroughsampling,wherext∼softmax(W·ht),

withhtasthehiddenstateandWasthelearnedweightmatrixmappingtotokenprobabilities.LLMs

utilizetransformerarchitectures(Vaswani

(2017))tocapturelong-rangedependenciesintext

.These

models,suchasClaude(

Anthropic

(2024)),Llama(

Dubeyetal.

(2024

);

Touvronetal.

(2023a

,b)),

andChatGPT(Achiametal.

(2023

);

Hurstetal.

(2024

);

OpenAI

(2022)),leveragevastdatasets

andscalingtechniques,thusenablingthemtoperformawidearrayoflanguage-basedtasks,suchas

translation,summarization,andreasoning,bygeneralizingpatternslearnedduringpretrainingto

novelinputs

Brown

(2020

LLMAgentsWhileLLMsdemonstratestrongunderstandingandreasoningabilities,theyfacechal-lengeswhenexecutingtasksinreal-worldscenarios.Toovercometheselimitations,theircapabilitiesareextendedthroughstructuredframeworks,enablingthemtoautonomouslyandsemi-autonomously

performtaskexecutionandsemi-autonomouslyperformtaskexecution(Chenetal.

(2023b

);

etal.

(2023

);

Qianetal.

(2024

);

Wuetal.

(2023

)).Thesesystems,referredtoasagents,utilize

techniquessuchaschain-of-thoughtprompting(Weietal.

(2022)

),iterativerefinement(Shinnetal.

(2024)),self-improvement(

Huangetal.

(2022)),andexternaltoolintegrationtoexecutecomplex

workflows(Haoetal.

(2024

);

Qinetal.

(2023

);

Schicketal.

(2023

)).LLMagentshavemaderemarkableprogressinsolvingtasksofreal-worldsignificance,suchassoftwareengineering

Jimenez

etal.

(2023

);

Wangetal.

(2024b

);

Yangetal.

(2024)),cybersecurity(Abramovichetal.

(2024

);

Fangetal.

(2024

);

Wanetal.

(2024)),andmedicaldiagnosis(McDuffetal.

(2023

);

Schmidgall

etal.

(2024

);

Tuetal.

(2024

)).TherehasalsobeenprogressinapplyingLLMsagentstoembodied

problemssuchasautonomousrobotics(Blacketal.

(2024

);

Brohanetal.

(2022,

2023);

Kimetal.

(2024)),webtasks(

Dengetal.

(2024

);

Guretal.

(2023

);

Heetal.

(2024

);

Puttaetal.

(2024

);

Shi

etal.

(2017)),andgameplaying(ALetal.

(2024

);

Fengetal.

(2024

);

Wangetal.

(2023

)).ForabroaderoverviewofLLMagents,referto

Wangetal.

(2024a

AgentLaboratory:UsingLLMAgentsasResearchAssistants

AutomatedmachinelearningAutomatedmachinelearningisanareaofactiveresearch,withmanyapproachesfocusedonusingKaggle,anonlineplatformformachinelearningcompetitions,asabenchmarkforevaluatingagentperformance.NotableeffortsincludeMLE-

Bench(Chanetal.

(2024)),DS-bench(

Jingetal.

(2024)),andMLAgentBench(

Huangetal.

(2024))whichpropose

using75,74,and6KagglechallengesrespectivelyasbenchmarkstomeasuretheabilitiesofMLagentsintaskssuchasdatapreparation,modeldevelopment,andsubmission.SeveralML"solvers"whichcansolveMLchallengeshavebeenintroduced,suchasAIDE(

Schmidtetal.

(2024)),CodeActAgent

(referredtoas“OpenHands")(

Wangetal.

(2024b)),andResearchAgent(referredtoas“MLAB")

fromMLAgentBench(

Huangetal.

(2024))whichautomatefeatureimplementation,bugfixing,and

coderefactoringwithahighsuccessrate.AgentK(

Grosnitetal.

(2024))demonstratestheabilityto

solveKagglechallengesatthehuman-levelwithachallengeURLprovidedasinput.

AIinScientificDiscoveryAIhasbeenusedtosupportscientificdiscoveryacrossnumerousdisci-plinesfordecades.Forinstance,AIhasbeenusedfordiscoveryinmathematics(

Romera-Paredes

etal.

(2024

)

),materialscience(Merchantetal.

(2023

);

Pyzer-Knappetal.

(2022

);

Szymanskietal.

(2023)),chemistry(

Hayesetal.

(2024

);

Jumperetal.

(2021)),algorithmdiscovery(Fawzietal.

(2022)),andcomputationalbiology(

Dingetal.

(2024

)).TheseapproachespositionAIasatoolratherthananagentperformingresearchinautonomousresearch.

LLMsforresearchrelatedtasksLLMshavedemonstratedstrongcapabilitiesindiverseresearch-

relatedtasks,suchascodegeneration(Chenetal.

(2021

);

Nijkampetal.

(2022

)),end-to-endsoftware

development(Haietal.

(2024

);

Phanetal.

(2024

);

Qianetal.

(2023,

2024)),codegenerationfor

discovery(Chenetal.

(2024b

);

Ghafarollahi&Buehler

(2024a

);

Guetal.

(2024

);

Guoetal.

(2024

);

Huetal.

(2024b

);

Ifarganetal.

(2024

);

Majumderetal.

(2024)),researchquestion-answering

(Chenetal.

(2024a

);

Lálaetal.

(2023

);

Linetal.

(2024

);

Songetal.

(2024

)),researchideation

(Baeketal.

(2024

);

Ghafarollahi&Buehler

(2024b

);

Lietal.

(2024a

);

Sietal.

(2024

)),automatedpaperreviewing(

D’Arcyetal.

(2024

);

Liangetal.

(2024

);

Luetal.

(2024b

);

Wengetal.

(2024

)),literaturesearch(

Ajithetal.

(2024

);

Kang&Xiong

(2024

);

Lietal.

(2024b

);

Pressetal.

(2024

)),

andpredictingtheoutcomeofexperiments(Ashokkumaretal.

(2024

);

Lehretal.

(2024

);

Luoetal.

(2024

);

Manningetal.

(2024

);

Zhangetal.

(2024

)).AlthoughLLMshavemadenotableprogressinsolvingtheaforementionedtasks,ideationhasstruggledtoprogress,withsomeworkshowingthat

LLMideationleadstogreaternoveltythanhumans(Sietal.

(2024

)),whileothersshowreducedcreativity(

Chakrabartyetal.

(2024))andgreaterhomogeneouseffects(Andersonetal.

(2024

);

Zhouetal.

(2024))thatmaylimitcreativediscoverywithouthumanguidance

Additionally,researchonhuman-AIcollaborationhasreachedmixedconclusionsabouttheidea

novelty(Ashkinazeetal.

(2024

);

Liuetal.

(2024

);

Padmakumar&He

(2024

)).Thesefindingssuggestthat,withthecurrentLLMs,thestrongestresearchsystemswouldcombinehuman-guidedideationwithLLM-basedworkflows.

LLMsforautonomousresearchRecentadvancementsinautomatedscientificworkflowshavefocusedonleveragingLLMstoemulatetheprocessofresearch.

Swansonetal.

(2024

)introducesateamofLLMagentsworkingasscientistsalongsideahumanresearcherwhoprovideshigh-levelfeedback,withtheendresultbeingnovelnanobodybindersaimedataddressingrecentvariantsofSARS-CoV-2.

ChemCrow(M.Branetal.

(2024

)

)andCoscientist(Boikoetal.

(2023

))demonstratetheabilityforautonomousideationandexperimentationinchemistry.

ResearchAgent(Baeketal.

(2024

))automatesresearchideageneration,experimentdesign,anditerativerefinementusingfeedbackfromreviewingagentsalignedwithhumanevaluationcriterion.

TheAIScientist(Luetal.

(2024a

))extends

AgentLaboratory:UsingLLMAgentsasResearchAssistants

Figure2|AgentLaboratoryWorkflow.ThisimageillustratesthethreeprimaryphasesofAgentLaboratory:LiteratureReview,Experimentation,andReportWriting,eachfeaturingdistincttasks,tools,andhuman-agentroles.ThepipelineintegrateshumaninputwithLLM-drivenagents,suchasthePhDandPostdocagents,whichhandleliteraturereviews,experimentalplanning,datapreparation,andresultinterpretation.Specializedtoolslikemle-solverforexperimentationandpaper-solverforreportgenerationautomatetediousresearchtasks,enablingcollaborationbetweenhumanresearchersandAItoproducehigh-qualityresearchoutputs.

thisautomationtoencompassend-to-endscientificdiscovery,includingcoding,experimentexecution,andautomatedpeerreviewformanuscriptgeneration.Despitetheseadvancements,studieslike

Sietal.

(2024)highlightlimitationsinthefeasibilityandimplementationdetailsofLLMideation,

indicatingacomplementaryratherthanreplacementroleforLLMsinautonomousresearch.

3.AgentLaboratory

Overview.AgentLaboratorybeginswiththeindependentcollectionandanalysisofrelevantresearchpapers,progressesthroughcollaborativeplanninganddatapreparation,andresultsinautomatedexperimentationandcomprehensivereportgeneration.AsshowninFigure

theoverallworkflowconsistsofthreeprimaryphases:(1)LiteratureReview,(2)Experimentation,and(3)ReportWriting.Inthissection,wewillintroducethesephasesindetailalongwiththecorrespondinginvolvedagents.Furthermore,inSection

,wewillconductqualitativeandquantitativeanalysestodemonstratethestrengthsofAgentLaboratoryanditsabilitytogenerate

3.1.LiteratureReview

LiteratureReview.Theliteraturereviewphaseinvolvesgatheringandcuratingrelevantresearchpapersforthegivenresearchideatoprovidereferencesforsubsequentstages.Duringthisprocess,thePhDagentutilizesthearXivAPItoretrieverelatedpapersandperformsthreemainactions:summary,fulltext,andaddpaper.Thesummaryactionretrievesabstractsofthetop20papersrelevanttotheinitialqueryproducedbytheagent.Thefulltextactionextractsthecompletecontentofspecificpapers,andtheaddpaperactionincorporatesselectedsummariesorfulltextsintothecuratedreview.Thisprocessisiterativeratherthanasingle-stepoperation,astheagentperformsmultiplequeries,evaluatestherelevanceofeachpaperbasedonitscontent,andrefinesthe

AgentLaboratory:UsingLLMAgentsasResearchAssistants

selectiontobuildacomprehensivereview.Oncethespecifiednumberofrelevanttexts(N=max)isreachedviatheaddpapercommand,thecuratedreviewisfinalizedforuseinsubsequentphases.

3.2.Experimentation

PlanFormulationTheplanformulationphasefocusesoncreatingadetailed,actionableresearchplanbasedontheliteraturereviewandresearchgoal.Duringthisphase,thePhDandPostdocagentscollaboratethroughdialoguetospecifyhowtoachievetheresearchobjective,detailingexperimentalcomponentsneededtocompletethespecifiedresearchideasuchaswhichmachinelearningmodelstoimplement,whichdatasetstouse,andthehigh-levelstepsoftheexperiment.Onceaconsensusisreached,thePostdocagentsubmitsthisplanusingtheplancommand,whichservesasasetofinstructionsforsubsequentsubtasks.

DataPreparation.Thegoalofthedatapreparationphaseistowritecodethatpreparesdataforrunningexperiments,usingtheinstructionsfromtheplanformulationstageasaguideline.TheMLEngineeragentexecutescodeusingPythoncommandcommandandobservesanyprintedoutput.TheMLEngineerhasaccesstoHuggingFacedatasets,searchableviathesearchHFcommand.Afteragreeingonthefinalizeddatapreparationcode,theSWEngineeragentsubmitsitusingthesubmitcodecommand.Beforethefinalsubmissionproceeds,thecodeisfirstpassedthroughaPythoncompilertoensurethattherearenocompilationissues.Thisprocesswillbeiterativelyexecuteduntilthecodeisbug-free.

RunningExperiments.Intherunningexperimentsphase,theMLEngineeragentfocusesonimple-mentingandexecutingtheexperimentalplanformulatedprior.Thisisfacilitatedbymle-solver,aspecializedmoduledesignedtogenerate,test,andrefinemachinelearningcodeautonomously.mle-solverbeginsbyproducinginitialcodebasedontheresearchplanandinsightsfromtheliteraturereview.Forthefirstmle-solverstep,theprogramisemptyandmustgenerateafilefromscratch,whichisusedasthetopscoringprogram.Thefollowingprocessesdescribetheworkflowofthemle-solver:

A.CommandExecution.Duringthecommandexecutionphase,aninitialprogramissampledfromamaintainedsetoftop-performingprograms,whichisrepresentedbyasinglefiledur-inginitialization.Themle-solveriterativelyrefinesthisprogramthroughtwooperations,REPLACEandEDIT,tobetteraligntheoutputwithexperimentalobjectives.TheEDITopera-tionidentifiesarangeoflines,substitutingthecodebetweenthespecifiedlinenumberswithnewlygeneratedcode.Incontrast,theREPLACEoperationgeneratesacompletelynewPythonfile.

B.CodeExecution.Afteracodecommandisexecuted,thenewprogramispassedthroughacompilertocheckforruntimeerrors.Ifitsuccessfullycompiles,ascoreisreturnedandthelistoftopprogramsisupdatedifthescoreishigherthantheexistingprograms.Ifthecodedoesnotcompile,theagentattemptstorepairthecodeforNreptries(Nrep=3inourexperiments)beforereturninganerrorandmovingontoanewcodereplacement.

C.ProgramScoring.Ifacodesucceedsincompilation,itissenttoascoringfunctionwhichdeterminesifitisbetterthanpreviouslyimplementedexperimentcode.Inordertoobtainaprogramscore,weimplementascoringfunctionthatusesanLLMrewardmodeltoassesstheeffectivenessoftheMLcodegeneratedbymle-solver.Therewardmodel,invokedasanLM,scorestheprogramonascalefrom0to1consideringtheoutlinedresearchplan,theproducedcode,andtheobservedoutputtodeterminehowaccuratelytheprogramadheresto

AgentLaboratory:UsingLLMAgentsasResearchAssistants

Figure3|Overviewofthemle-solverworkflow.ThisdiagramdetailstheiterativeprocessusedbytheMLE-Solvertoautonomouslygeneratemachinelearningcode.Beginningwithexternalresources,theworkflowintegratescommandexecution(A),wherenewcodeisgenerated,followedbycodeexecution(B)tocompileandrepairissuesifneeded.Programscoring(C)evaluatesthegeneratedcodeusingarewardfunction,whileself-reflection(D)helpsrefinefutureiterationsbasedonresults.Performancestabilization(E)ensuresconsistentoutcomesbymaintainingapooloftop-performingprogramsanditerativeoptimization.

theinitialgoals.Ascoreof1isprovidedforresultswithhighalignmentandeverythingbelowonaspectrumofhowcloselytheoutputandcodematchestheplanninggoals.Thisprocessis

similartoexistingmethodsforLLMreasoningtreesearch(Yaoetal.

(2024

)),whereinsteadofaseriesofreasoningstepsbeingtraversedusingself-evaluatedLLMscoring,thesetofpossibleprogramsarebeingtraversed(viaEDITandREPLACEcommands)andtheresultingprogramoutcomeisself-evaluatedtodetermineifaprogramisworthbuildingon.ThisissimilartotheSolutionSpaceSearchofAIDE(

Schmidtetal.

(2024)),howevertheirmethodwasspecifically

designedfortheKagglecompetitionsandissimplyextractingtheaccuracyratherthanscoringtheresearchcodeandoutcomes.

D.SelfReflection.Whetherthecodesucceedsorfails,aself-reflectionisproducedbasedon

theexperimentalresultsortheencounterederrorsignal(Renze&Guven

(2024

);

Shinnetal.

(2024

)).Here,themle-solverispromptedtoreflectontheoutcomeofitsactions.Iftheprogramfailedtocompile,thesolverreflectsonhowtofixthisissueinnextiterations.Ifitsuccessfulycompilesandreturnsascore,thesolverwillreflectonhowtoincreasethisscore.Thesereflectionsaregeneratedtoimprovefutureperformance,ensuringthatthesystemlearnsfromerrors,improvingthequalityandrobustnessofthegeneratedcodeoveriterativecycles.

E.PerformanceStabilizationTopreventperformancedrift,twomechanismsareimplemented:topprogramsamplingandbatch-parallelization.Intopprogramsampling,acollectionofthehighest-scoringprogramsismaintained,andoneprogramisrandomlysampledbeforeexecutingacommand,ensuringdiversitywhileretainingquality.Forbatch-parallelization,eachsolverstepinvolvesmakingNmodificationssimultaneously,withthetopmodificationselectedtoreplacethelowest-scoringprograminthetopcollection.Thesestrategiesusehigh-entropysamplingtomodifythecode,resultinginabalancebetweenexplorationofnewsolutionsand

AgentLaboratory:UsingLLMAgentsasResearchAssistants

Figure4|Graphicaloutlineofpaper-solver.Thisdiagramshowcasesthestep-by-stepprocessofgeneratingandrefiningacademicresearchreportsusingthePaper-Solvertool.Theworkflowstartswiththecreationofaninitialreportscaffold(A)byiterativelygeneratingLaTeX-basedsections,followedbyupdatestoensurestructuralcompleteness.(B)ResearchisperformedthroughanArxivtoolduringrelevantsections.IntheReportEditingphase(C),thelanguagemodelappliestargetededitstoimprovethedocument,withLaTeXcompilationverifyingtheintegrityofchanges.Finally,thecompletedreportundergoesareward-basedevaluationduringthePaperReviewphase(D),ensuringalignmentwithacademicstandardsandresearchgoals.

refinementofexistingonesinordertomaintainstablecodemodifications.

ResultsInterpretation.Thegoaloftheresultsinterpretationphaseistoderivemeaningfulinsightsfromexperimentaloutcomestoinformthefinalreport.ThePhDandPostdocagentsdiscusstheirun-derstandingoftheexperimentalresultsproducedbymle-solver.Oncetheyagreeonameaningfulinterpretationthatcouldcontributetoacompellingacademicpaper,thePostdocagentsubmitsitusingtheinterpretationcommand,formingthebasisforthereportwritingphase.

3.3.ReportWriting

ReportWriting.Inthereportwritingphase,thePhDandProfessoragentsynthesizetheresearchfindingsintoacomprehensiveacademicreport.Thisprocessisfacilitatedbyaspecializedmodulecalledpaper-solver,whichiterativelygeneratesandrefinesthereport.Thepaper-solveraimstoactasareportgenerator,positioningtheworkthathasbeenproducedbypreviousstagesofAgentLaboratory.paper-solverdoesnotaimtoentirelyreplacetheacademicpaper-writingprocess,butrathertosummarizetheresearchthathasbeenproducedinahuman-readableformatsothattheresearcherusingAgentLaboratoryunderstandswhathasbeenaccomplished.Theoutputfollowsthestandardstructureofanacademicpaper,ensuringitmeetsconferencesubmissionrequirements(forthepaperscoringphase)whilebeingclearandmethodical.Thefollowingprocessesdescribetheworkflowofpaper-solver:

A.InitialReportScaffold.Thefirsttaskofthepaper-solveristogenerateaninitialscaffoldfortheresearchpaper.Thisscaffoldoutlinesthedocumentstructure,dividingitintoeightstan-dardizedsections:Abstract,Introduction,Background,RelatedWork,Methods,ExperimentalSetup,Results,andDiscussion.D

人人文库> 全部分类> 应用文书 > 研究报告

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

智能体实验室：将大语言模型（LLM）智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants

文档简介

温馨提示

最新文档

评论

智能体实验室：将大语言模型（LLM）智能体作为研究助理 Agent Laboratory - Using LLM Agents as Research Assistants

文档简介

温馨提示

最新文档

评论

相关文档