




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Agents
Authors:JuliaWiesinger,PatrickMarlowandVladimirVuskovic
Agents
Acknowledgements
ReviewersandContributors
EvanHuangEmilyXue
OlcanSercinogluSebastianRiedelSatinderBaveja
AntonioGulli
AnantNawalgaria
CuratorsandEditors
AntonioGulli
AnantNawalgariaGraceMollison
TechnicalWriter
JoeyHaymaker
Designer
MichaelLanning
September20242
Tableofcontents
Introduction4
Whatisanagent?5
Themodel·6
Thetools·7
Theorchestrationlayer7
Agentsvs.models·8
Cognitivearchitectures:Howagentsoperate··8
Tools:Ourkeystotheoutsideworld12
Extensions··13
SampleExtensions··15
Functions·18
Usecases21
Functionsamplecode24
Datastores·27
Implementationandapplication28
Toolsrecap··32
Enhancingmodelperformancewithtargetedlearning·33
AgentquickstartwithLangChain35
ProductionapplicationswithVertexAIagents38
Summary40
Endnotes42
Agents
Thiscombinationofreasoning,
logic,andaccesstoexternal
informationthatareallconnectedtoaGenerativeAImodelinvokestheconceptofanagent.
Introduction
Humansarefantasticatmessypatternrecognitiontasks.However,theyoftenrelyontools-likebooks,GoogleSearch,oracalculator-tosupplementtheirpriorknowledgebefore
arrivingataconclusion.Justlikehumans,GenerativeAImodelscanbetrainedtousetoolstoaccessreal-timeinformationorsuggestareal-worldaction.Forexample,amodelcan
leverageadatabaseretrievaltooltoaccessspecificinformation,likeacustomer'spurchasehistory,soitcangeneratetailoredshoppingrecommendations.Alternatively,basedona
user'squery,amodelcanmakevariousAPIcallstosendanemailresponsetoacolleagueorcompleteafinancialtransactiononyourbehalf.Todoso,themodelmustnotonlyhaveaccesstoasetofexternaltools,itneedstheabilitytoplanandexecuteanytaskinaself-directedfashion.Thiscombinationofreasoning,logic,andaccesstoexternalinformation
thatareallconnectedtoaGenerativeAImodelinvokestheconceptofanagent,ora
programthatextendsbeyondthestandalonecapabilitiesofaGenerativeAImodel.Thiswhitepaperdivesintoalltheseandassociatedaspectsinmoredetail.
September20244
Agents
Whatisanagent?
Initsmostfundamentalform,aGenerativeAIagentcanbedefinedasanapplicationthat
attemptstoachieveagoalbyobservingtheworldandactinguponitusingthetoolsthatit
hasatitsdisposal.Agentsareautonomousandcanactindependentlyofhumanintervention,especiallywhenprovidedwithpropergoalsorobjectivestheyaremeanttoachieve.Agentscanalsobeproactiveintheirapproachtoreachingtheirgoals.Evenintheabsenceof
explicitinstructionsetsfromahuman,anagentcanreasonaboutwhatitshoulddonexttoachieveitsultimategoal.WhilethenotionofagentsinAIisquitegeneralandpowerful,thiswhitepaperfocusesonthespecifictypesofagentsthatGenerativeAImodelsarecapableofbuildingatthetimeofpublication.
Inordertounderstandtheinnerworkingsofanagent,let’sfirstintroducethefoundationalcomponentsthatdrivetheagent’sbehavior,actions,anddecisionmaking.Thecombinationofthesecomponentscanbedescribedasacognitivearchitecture,andtherearemany
sucharchitecturesthatcanbeachievedbythemixingandmatchingofthesecomponents.Focusingonthecorefunctionalities,therearethreeessentialcomponentsinanagent’s
cognitivearchitectureasshowninFigure1.
September2024s
Agents
Figure1.Generalagentarchitectureandcomponents
Themodel
Inthescopeofanagent,amodelreferstothelanguagemodel(LM)thatwillbeutilizedas
thecentralizeddecisionmakerforagentprocesses.ThemodelusedbyanagentcanbeoneormultipleLM’sofanysize(small/large)thatarecapableoffollowinginstructionbased
reasoningandlogicframeworks,likeReAct,Chain-of-Thought,orTree-of-Thoughts.Modelscanbegeneralpurpose,multimodalorfine-tunedbasedontheneedsofyourspecificagentarchitecture.Forbestproductionresults,youshouldleverageamodelthatbestfitsyour
desiredendapplicationand,ideally,hasbeentrainedondatasignaturesassociatedwiththetoolsthatyouplantouseinthecognitivearchitecture.It’simportanttonotethatthemodelistypicallynottrainedwiththespecificconfigurationsettings(i.e.toolchoices,orchestration/reasoningsetup)oftheagent.However,it’spossibletofurtherrefinethemodelforthe
agent’stasksbyprovidingitwithexamplesthatshowcasetheagent’scapabilities,includinginstancesoftheagentusingspecifictoolsorreasoningstepsinvariouscontexts.
September20246
Agents
Thetools
Foundationalmodels,despitetheirimpressivetextandimagegeneration,remainconstrainedbytheirinabilitytointeractwiththeoutsideworld.Toolsbridgethisgap,empoweringagentstointeractwithexternaldataandserviceswhileunlockingawiderrangeofactionsbeyond
thatoftheunderlyingmodelalone.Toolscantakeavarietyofformsandhavevarying
depthsofcomplexity,buttypicallyalignwithcommonwebAPImethodslikeGET,POST,
PATCH,andDELETE.Forexample,atoolcouldupdatecustomerinformationinadatabaseorfetchweatherdatatoinfluenceatravelrecommendationthattheagentisprovidingtotheuser.Withtools,agentscanaccessandprocessreal-worldinformation.Thisempowers
themtosupportmorespecializedsystemslikeretrievalaugmentedgeneration(RAG),
whichsignificantlyextendsanagent’scapabilitiesbeyondwhatthefoundationalmodelcanachieveonitsown.We’lldiscusstoolsinmoredetailbelow,butthemostimportantthingtounderstandisthattoolsbridgethegapbetweentheagent’sinternalcapabilitiesandtheexternalworld,unlockingabroaderrangeofpossibilities.
Theorchestrationlayer
Theorchestrationlayerdescribesacyclicalprocessthatgovernshowtheagenttakesin
information,performssomeinternalreasoning,andusesthatreasoningtoinformitsnextactionordecision.Ingeneral,thisloopwillcontinueuntilanagenthasreacheditsgoalorastoppingpoint.Thecomplexityoftheorchestrationlayercanvarygreatlydependingontheagentandtaskit’sperforming.Someloopscanbesimplecalculationswithdecisionrules,whileothersmaycontainchainedlogic,involveadditionalmachinelearningalgorithms,orimplementotherprobabilisticreasoningtechniques.We’lldiscussmoreaboutthedetailedimplementationoftheagentorchestrationlayersinthecognitivearchitecturesection.
September20247
Agents
Agentsvs.models
Togainaclearerunderstandingofthedistinctionbetweenagentsandmodels,considerthefollowingchart:
Models
Agents
Knowledgeislimitedtowhatisavailableintheirtrainingdata.
Knowledgeisextendedthroughtheconnectionwithexternalsystemsviatools
Singleinference/predictionbasedonthe
userquery.Unlessexplicitlyimplementedforthemodel,thereisnomanagementofsessionhistoryorcontinuouscontext.(i.e.chathistory)
Managedsessionhistory(i.e.chathistory)to
allowformultiturninference/predictionbasedonuserqueriesanddecisionsmadeinthe
orchestrationlayer.Inthiscontext,a‘turn’is
definedasaninteractionbetweentheinteractingsystemandtheagent.(i.e.1incomingevent/
queryand1agentresponse)
Nonativetoolimplementation.
Toolsarenativelyimplementedinagentarchitecture.
Nonativelogiclayerimplemented.Userscanformpromptsassimplequestionsoruse
reasoningframeworks(CoT,ReAct,etc.)toformcomplexpromptstoguidethemodelin
prediction.
NativecognitivearchitecturethatusesreasoningframeworkslikeCoT,ReAct,orotherpre-built
agentframeworkslikeLangChain.
Cognitivearchitectures:Howagentsoperate
Imagineachefinabusykitchen.Theirgoalistocreatedeliciousdishesforrestaurantpatronswhichinvolvessomecycleofplanning,execution,andadjustment.
September20248
Agents
•Theygatherinformation,likethepatron’sorderandwhatingredientsareinthepantryandrefrigerator.
•Theyperformsomeinternalreasoningaboutwhatdishesandflavorprofilestheycancreatebasedontheinformationtheyhavejustgathered.
•Theytakeactiontocreatethedish:choppingvegetables,blendingspices,searingmeat.
Ateachstageintheprocessthechefmakesadjustmentsasneeded,refiningtheirplanasingredientsaredepletedorcustomerfeedbackisreceived,andusesthesetofprevious
outcomestodeterminethenextplanofaction.Thiscycleofinformationintake,planning,executing,andadjustingdescribesauniquecognitivearchitecturethatthechefemploystoreachtheirgoal.
Justlikethechef,agentscanusecognitivearchitecturestoreachtheirendgoalsby
iterativelyprocessinginformation,makinginformeddecisions,andrefiningnextactions
basedonpreviousoutputs.Atthecoreofagentcognitivearchitecturesliestheorchestrationlayer,responsibleformaintainingmemory,state,reasoningandplanning.Itusestherapidlyevolvingfieldofpromptengineeringandassociatedframeworkstoguidereasoningand
planning,enablingtheagenttointeractmoreeffectivelywithitsenvironmentandcompletetasks.Researchintheareaofpromptengineeringframeworksandtaskplanningfor
languagemodelsisrapidlyevolving,yieldingavarietyofpromisingapproaches.Whilenotanexhaustivelist,theseareafewofthemostpopularframeworksandreasoningtechniquesavailableatthetimeofthispublication:
•ReAct,apromptengineeringframeworkthatprovidesathoughtprocessstrategyfor
languagemodelstoReasonandtakeactiononauserquery,withorwithoutin-context
examples.ReActpromptinghasshowntooutperformseveralSOTAbaselinesandimprovehumaninteroperabilityandtrustworthinessofLLMs.
September20249
Agents
•Chain-of-Thought(CoT),apromptengineeringframeworkthatenablesreasoning
capabilitiesthroughintermediatesteps.Therearevarioussub-techniquesofCoTincludingself-consistency,active-prompt,andmultimodalCoTthateachhavestrengthsand
weaknessesdependingonthespecificapplication.
•Tree-of-thoughts(ToT),,apromptengineeringframeworkthatiswellsuitedfor
explorationorstrategiclookaheadtasks.Itgeneralizesoverchain-of-thoughtpromptingandallowsthemodeltoexplorevariousthoughtchainsthatserveasintermediatestepsforgeneralproblemsolvingwithlanguagemodels.
Agentscanutilizeoneoftheabovereasoningtechniques,ormanyothertechniques,to
choosethenextbestactionforthegivenuserrequest.Forexample,let’sconsideranagentthatisprogrammedtousetheReActframeworktochoosethecorrectactionsandtoolsfortheuserquery.Thesequenceofeventsmightgosomethinglikethis:
1.Usersendsquerytotheagent
2.AgentbeginstheReActsequence
3.Theagentprovidesaprompttothemodel,askingittogenerateoneofthenextReActstepsanditscorrespondingoutput:
a.Question:Theinputquestionfromtheuserquery,providedwiththeprompt
b.Thought:Themodel’sthoughtsaboutwhatitshoulddonext
c.Action:Themodel’sdecisiononwhatactiontotakenexti.Thisiswheretoolchoicecanoccur
ii.Forexample,anactioncouldbeoneof[Flights,Search,Code,None],wherethefirst3representaknowntoolthatthemodelcanchoose,andthelastrepresents“no
toolchoice”
September202410
Agents
d.Actioninput:Themodel’sdecisiononwhatinputstoprovidetothetool(ifany)e.Observation:Theresultoftheaction/actioninputsequence
i.Thisthought/action/actioninput/observationcouldrepeatN-timesasneededf.Finalanswer:Themodel’sfinalanswertoprovidetotheoriginaluserquery
4.TheReActloopconcludesandafinalanswerisprovidedbacktotheuser
Figure2.ExampleagentwithReActreasoningintheorchestrationlayer
AsshowninFigure2,themodel,tools,andagentconfigurationworktogethertoprovideagrounded,conciseresponsebacktotheuserbasedontheuser’soriginalquery.Whilethemodelcouldhaveguessedatananswer(hallucinated)basedonitspriorknowledge,itinsteadusedatool(Flights)tosearchforreal-timeexternalinformation.Thisadditional
informationwasprovidedtothemodel,allowingittomakeamoreinformeddecisionbasedonrealfactualdataandtosummarizethisinformationbacktotheuser.
September202411
Agents
Insummary,thequalityofagentresponsescanbetieddirectlytothemodel’sabilityto
reasonandactaboutthesevarioustasks,includingtheabilitytoselecttherighttools,andhowwellthattoolshasbeendefined.Likeachefcraftingadishwithfreshingredientsandattentivetocustomerfeedback,agentsrelyonsoundreasoningandreliableinformationtodeliveroptimalresults.Inthenextsection,we’lldiveintothevariouswaysagentsconnectwithfreshdata.
Tools:Ourkeystotheoutsideworld
Whilelanguagemodelsexcelatprocessinginformation,theylacktheabilitytodirectly
perceiveandinfluencetherealworld.Thislimitstheirusefulnessinsituationsrequiring
interactionwithexternalsystemsordata.Thismeansthat,inasense,alanguagemodel
isonlyasgoodaswhatithaslearnedfromitstrainingdata.Butregardlessofhowmuch
datawethrowatamodel,theystilllackthefundamentalabilitytointeractwiththeoutsideworld.Sohowcanweempowerourmodelstohavereal-time,context-awareinteractionwithexternalsystems?Functions,Extensions,DataStoresandPluginsareallwaystoprovidethiscriticalcapabilitytothemodel.
Whiletheygobymanynames,toolsarewhatcreatealinkbetweenourfoundationalmodelsandtheoutsideworld.Thislinktoexternalsystemsanddataallowsouragenttoperformawidervarietyoftasksanddosowithmoreaccuracyandreliability.Forinstance,toolscanenableagentstoadjustsmarthomesettings,updatecalendars,fetchuserinformationfromadatabase,orsendemailsbasedonaspecificsetofinstructions.
Asofthedateofthispublication,therearethreeprimarytooltypesthatGooglemodelsareabletointeractwith:Extensions,Functions,andDataStores.Byequippingagentswithtools,weunlockavastpotentialforthemtonotonlyunderstandtheworldbutalsoactuponit,
openingdoorstoamyriadofnewapplicationsandpossibilities.
September202412
Agents
Extensions
TheeasiestwaytounderstandExtensionsistothinkofthemasbridgingthegapbetween
anAPIandanagentinastandardizedway,allowingagentstoseamlesslyexecuteAPIs
regardlessoftheirunderlyingimplementation.Let’ssaythatyou’vebuiltanagentwithagoalofhelpingusersbookflights.YouknowthatyouwanttousetheGoogleFlightsAPItoretrieveflightinformation,butyou’renotsurehowyou’regoingtogetyouragenttomakecallstothisAPIendpoint.
Figure3.HowdoAgentsinteractwithExternalAPIs?
Oneapproachcouldbetoimplementcustomcodethatwouldtaketheincominguserquery,parsethequeryforrelevantinformation,thenmaketheAPIcall.Forexample,inaflight
bookingusecaseausermightstate“IwanttobookaflightfromAustintoZurich.”Inthis
scenario,ourcustomcodesolutionwouldneedtoextract“Austin”and“Zurich”asrelevantentitiesfromtheuserquerybeforeattemptingtomaketheAPIcall.Butwhathappensiftheusersays“IwanttobookaflighttoZurich”andneverprovidesadeparturecity?TheAPIcall
wouldfailwithouttherequireddataandmorecodewouldneedtobeimplementedinordertocatchedgeandcornercaseslikethis.Thisapproachisnotscalableandcouldeasilybreakinanyscenariothatfallsoutsideoftheimplementedcustomcode.
September202413
Agents
AmoreresilientapproachwouldbetouseanExtension.AnExtensionbridgesthegapbetweenanagentandanAPIby:
1.TeachingtheagenthowtousetheAPIendpointusingexamples.
2.TeachingtheagentwhatargumentsorparametersareneededtosuccessfullycalltheAPIendpoint.
Figure4.ExtensionsconnectAgentstoExternalAPIs
Extensionscanbecraftedindependentlyoftheagent,butshouldbeprovidedaspartoftheagent’sconfiguration.TheagentusesthemodelandexamplesatruntimetodecidewhichExtension,ifany,wouldbesuitableforsolvingtheuser’squery.ThishighlightsakeystrengthofExtensions,theirbuilt-inexampletypes,thatallowtheagenttodynamicallyselectthe
mostappropriateExtensionforthetask.
Figure5.1-to-manyrelationshipbetweenAgents,ExtensionsandAPIs
September202414
Agents
ThinkofthisthesamewaythatasoftwaredeveloperdecideswhichAPIendpointstousewhilesolvingandsolutioningforauser’sproblem.Iftheuserwantstobookaflight,the
developermightusetheGoogleFlightsAPI.Iftheuserwantstoknowwherethenearest
coffeeshopisrelativetotheirlocation,thedevelopermightusetheGoogleMapsAPI.In
thissameway,theagent/modelstackusesasetofknownExtensionstodecidewhichonewillbethebestfitfortheuser’squery.Ifyou’dliketoseeExtensionsinaction,youcantrythemoutontheGeminiapplicationbygoingtoSettings>Extensionsandthenenablinganyyouwouldliketotest.Forexample,youcouldenabletheGoogleFlightsextensionthenaskGemini“ShowmeflightsfromAustintoZurichleavingnextFriday.”
SampleExtensions
TosimplifytheusageofExtensions,Googleprovidessomeoutoftheboxextensionsthat
canbequicklyimportedintoyourprojectandusedwithminimalconfigurations.Forexample,theCodeInterpreterextensioninSnippet1allowsyoutogenerateandrunPythoncodefromanaturallanguagedescription.
September202415
Agents
Python
importvertexaiimportpprint
PROJECT_ID="YOUR_PROJECT_ID"REGION="us-central1"
vertexai.init(project=PROJECT_ID,location=REGION)
fromvertexai.preview.extensionsimportExtension
extension_code_interpreter=Extension.from_hub("code_interpreter")
CODE_QUERY="""WriteapythonmethodtoinvertabinarytreeinO(n)time."""
response=extension_code_interpreter.execute(
operation_id="generate_and_execute",
operation_params={"query":CODE_QUERY})
print("GeneratedCode:")
pprint.pprint({response['generated_code']})
#Theabovesnippetwillgeneratethefollowingcode.
```
GeneratedCode:classTreeNode:
definit(self,val=0,left=None,right=None):self.val=val
self.left=left
self.right=right
Continuesnextpage...
September202416
Agents
Python
definvert_binary_tree(root):
"""
Invertsabinarytree.Args:
root:Therootofthebinarytree.Returns:
Therootoftheinvertedbinarytree.
"""
ifnotroot:
returnNone
#Swaptheleftandrightchildrenrecursively
root.left,root.right=
invert_binary_tree(root.right),invert_binary_tree(root.left)
returnroot
#Exampleusage:
#Constructasamplebinarytree
root=TreeNode(4)
root.left=TreeNode(2)root.right=TreeNode(7)
root.left.left=TreeNode(1)
root.left.right=TreeNode(3)root.right.left=TreeNode(6)root.right.right=TreeNode(9)
#Invertthebinarytree
inverted_root=invert_binary_tree(root)```
Snippet1.CodeInterpreterExtensioncangenerateandrunPythoncode
September202417
Agents
Tosummarize,Extensionsprovideawayforagentstoperceive,interact,andinfluencetheoutsideworldinamyriadofways.TheselectionandinvocationoftheseExtensionsisguidedbytheuseofExamples,allofwhicharedefinedaspartoftheExtensionconfiguration.
Functions
Intheworldofsoftwareengineering,functionsaredefinedasself-containedmodules
ofcodethataccomplishaspecifictaskandcanbereusedasneeded.Whenasoftwaredeveloperiswritingaprogram,theywilloftencreatemanyfunctionstodovarioustasks.Theywillalsodefinethelogicforwhentocallfunction_aversusfunction_b,aswellastheexpectedinputsandoutputs.
Functionsworkverysimilarlyintheworldofagents,butwecanreplacethesoftware
developerwithamodel.AmodelcantakeasetofknownfunctionsanddecidewhentouseeachFunctionandwhatargumentstheFunctionneedsbasedonitsspecification.FunctionsdifferfromExtensionsinafewways,mostnotably:
1.AmodeloutputsaFunctionanditsarguments,butdoesn’tmakealiveAPIcall.
2.Functionsareexecutedontheclient-side,whileExtensionsareexecutedontheagent-side.
UsingourGoogleFlightsexampleagain,asimplesetupforfunctionsmightlookliketheexampleinFigure7.
September202418
Agents
Figure7.HowdofunctionsinteractwithexternalAPIs?
NotethatthemaindifferencehereisthatneithertheFunctionnortheagentinteractdirectlywiththeGoogleFlightsAPI.SohowdoestheAPIcallactuallyhappen?
Withfunctions,thelogicandexecutionofcallingtheactualAPIendpointisoffloadedawayfromtheagentandbacktotheclient-sideapplicationasseeninFigure8andFigure9below.Thisoffersthedevelopermoregranularcontrolovertheflowofdataintheapplication.There
aremanyreasonswhyaDevelopermightchoosetousefunctionsoverExtensions,butafewcommonusecasesare:
•APIcallsneedtobemadeatanotherlayeroftheapplicationstack,outsideofthedirectagentarchitectureflow(e.g.amiddlewaresystem,afrontendframework,etc.)
•SecurityorAuthenticationrestrictionsthatpreventtheagentfromcallinganAPIdirectly(e.gAPIisnotexposedtotheinternet,ornon-accessiblebyagentinfrastructure)
•Timingororder-of-operationsconstraintsthatpreventtheagentfrommakingAPIcallsinreal-time.(i.e.batchoperations,human-in-the-loopreview,etc.)
September202419
Agents
•AdditionaldatatransformationlogicneedstobeappliedtotheAPIResponsethatthe
agentcannotperform.Forexample,consideranAPIendpointthatdoesn’tprovidea
filteringmechanismforlimitingthenumberofresultsreturned.UsingFunctionsonthe
client-sideprovidesthedeveloperadditionalopportunitiestomakethesetransformations.
•Thedeveloperwantstoiterateonagentdevelopmentwithoutdeployingadditional
infrastructurefortheAPIendpoints(i.e.FunctionCallingcanactlike“stubbing”ofAPIs)
WhilethedifferenceininternalarchitecturebetweenthetwoapproachesissubtleasseeninFigure8,theadditionalcontrolanddecoupleddependencyonexternalinfrastructuremakesFunctionCallinganappealingoptionfortheDeveloper.
Figure8
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
评论
0/150
提交评论