2025年Agents与基础应用白皮书(英文版)-谷歌_第1页
2025年Agents与基础应用白皮书(英文版)-谷歌_第2页
2025年Agents与基础应用白皮书(英文版)-谷歌_第3页
2025年Agents与基础应用白皮书(英文版)-谷歌_第4页
2025年Agents与基础应用白皮书(英文版)-谷歌_第5页
已阅读5页,还剩79页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Agents

Authors:JuliaWiesinger,PatrickMarlowandVladimirVuskovic

Google

Agents

Acknowledgements

ReviewersandContributors

EvanHuangEmilyXue

OlcanSercinogluSebastianRiedelSatinderBaveja

AntonioGulli

AnantNawalgaria

CuratorsandEditors

AntonioGulli

AnantNawalgariaGraceMollison

TechnicalWriter

JoeyHaymaker

Designer

MichaelLanning

September20242

Tableofcontents

Introduction4

Whatisanagent?5

Themodel·6

Thetools·7

Theorchestrationlayer7

Agentsvs.models·8

Cognitivearchitectures:Howagentsoperate··8

Tools:Ourkeystotheoutsideworld12

Extensions··13

SampleExtensions··15

Functions·18

Usecases21

Functionsamplecode24

Datastores·27

Implementationandapplication28

Toolsrecap··32

Enhancingmodelperformancewithtargetedlearning·33

AgentquickstartwithLangChain35

ProductionapplicationswithVertexAIagents38

Summary40

Endnotes42

Agents

Thiscombinationofreasoning,

logic,andaccesstoexternal

informationthatareallconnectedtoaGenerativeAImodelinvokestheconceptofanagent.

Introduction

Humansarefantasticatmessypatternrecognitiontasks.However,theyoftenrelyontools-likebooks,GoogleSearch,oracalculator-tosupplementtheirpriorknowledgebefore

arrivingataconclusion.Justlikehumans,GenerativeAImodelscanbetrainedtousetoolstoaccessreal-timeinformationorsuggestareal-worldaction.Forexample,amodelcan

leverageadatabaseretrievaltooltoaccessspecificinformation,likeacustomer'spurchasehistory,soitcangeneratetailoredshoppingrecommendations.Alternatively,basedona

user'squery,amodelcanmakevariousAPIcallstosendanemailresponsetoacolleagueorcompleteafinancialtransactiononyourbehalf.Todoso,themodelmustnotonlyhaveaccesstoasetofexternaltools,itneedstheabilitytoplanandexecuteanytaskinaself-directedfashion.Thiscombinationofreasoning,logic,andaccesstoexternalinformation

thatareallconnectedtoaGenerativeAImodelinvokestheconceptofanagent,ora

programthatextendsbeyondthestandalonecapabilitiesofaGenerativeAImodel.Thiswhitepaperdivesintoalltheseandassociatedaspectsinmoredetail.

September20244

Agents

Whatisanagent?

Initsmostfundamentalform,aGenerativeAIagentcanbedefinedasanapplicationthat

attemptstoachieveagoalbyobservingtheworldandactinguponitusingthetoolsthatit

hasatitsdisposal.Agentsareautonomousandcanactindependentlyofhumanintervention,especiallywhenprovidedwithpropergoalsorobjectivestheyaremeanttoachieve.Agentscanalsobeproactiveintheirapproachtoreachingtheirgoals.Evenintheabsenceof

explicitinstructionsetsfromahuman,anagentcanreasonaboutwhatitshoulddonexttoachieveitsultimategoal.WhilethenotionofagentsinAIisquitegeneralandpowerful,thiswhitepaperfocusesonthespecifictypesofagentsthatGenerativeAImodelsarecapableofbuildingatthetimeofpublication.

Inordertounderstandtheinnerworkingsofanagent,let’sfirstintroducethefoundationalcomponentsthatdrivetheagent’sbehavior,actions,anddecisionmaking.Thecombinationofthesecomponentscanbedescribedasacognitivearchitecture,andtherearemany

sucharchitecturesthatcanbeachievedbythemixingandmatchingofthesecomponents.Focusingonthecorefunctionalities,therearethreeessentialcomponentsinanagent’s

cognitivearchitectureasshowninFigure1.

September2024s

Agents

Figure1.Generalagentarchitectureandcomponents

Themodel

Inthescopeofanagent,amodelreferstothelanguagemodel(LM)thatwillbeutilizedas

thecentralizeddecisionmakerforagentprocesses.ThemodelusedbyanagentcanbeoneormultipleLM’sofanysize(small/large)thatarecapableoffollowinginstructionbased

reasoningandlogicframeworks,likeReAct,Chain-of-Thought,orTree-of-Thoughts.Modelscanbegeneralpurpose,multimodalorfine-tunedbasedontheneedsofyourspecificagentarchitecture.Forbestproductionresults,youshouldleverageamodelthatbestfitsyour

desiredendapplicationand,ideally,hasbeentrainedondatasignaturesassociatedwiththetoolsthatyouplantouseinthecognitivearchitecture.It’simportanttonotethatthemodelistypicallynottrainedwiththespecificconfigurationsettings(i.e.toolchoices,orchestration/reasoningsetup)oftheagent.However,it’spossibletofurtherrefinethemodelforthe

agent’stasksbyprovidingitwithexamplesthatshowcasetheagent’scapabilities,includinginstancesoftheagentusingspecifictoolsorreasoningstepsinvariouscontexts.

September20246

Agents

Thetools

Foundationalmodels,despitetheirimpressivetextandimagegeneration,remainconstrainedbytheirinabilitytointeractwiththeoutsideworld.Toolsbridgethisgap,empoweringagentstointeractwithexternaldataandserviceswhileunlockingawiderrangeofactionsbeyond

thatoftheunderlyingmodelalone.Toolscantakeavarietyofformsandhavevarying

depthsofcomplexity,buttypicallyalignwithcommonwebAPImethodslikeGET,POST,

PATCH,andDELETE.Forexample,atoolcouldupdatecustomerinformationinadatabaseorfetchweatherdatatoinfluenceatravelrecommendationthattheagentisprovidingtotheuser.Withtools,agentscanaccessandprocessreal-worldinformation.Thisempowers

themtosupportmorespecializedsystemslikeretrievalaugmentedgeneration(RAG),

whichsignificantlyextendsanagent’scapabilitiesbeyondwhatthefoundationalmodelcanachieveonitsown.We’lldiscusstoolsinmoredetailbelow,butthemostimportantthingtounderstandisthattoolsbridgethegapbetweentheagent’sinternalcapabilitiesandtheexternalworld,unlockingabroaderrangeofpossibilities.

Theorchestrationlayer

Theorchestrationlayerdescribesacyclicalprocessthatgovernshowtheagenttakesin

information,performssomeinternalreasoning,andusesthatreasoningtoinformitsnextactionordecision.Ingeneral,thisloopwillcontinueuntilanagenthasreacheditsgoalorastoppingpoint.Thecomplexityoftheorchestrationlayercanvarygreatlydependingontheagentandtaskit’sperforming.Someloopscanbesimplecalculationswithdecisionrules,whileothersmaycontainchainedlogic,involveadditionalmachinelearningalgorithms,orimplementotherprobabilisticreasoningtechniques.We’lldiscussmoreaboutthedetailedimplementationoftheagentorchestrationlayersinthecognitivearchitecturesection.

September20247

Agents

Agentsvs.models

Togainaclearerunderstandingofthedistinctionbetweenagentsandmodels,considerthefollowingchart:

Models

Agents

Knowledgeislimitedtowhatisavailableintheirtrainingdata.

Knowledgeisextendedthroughtheconnectionwithexternalsystemsviatools

Singleinference/predictionbasedonthe

userquery.Unlessexplicitlyimplementedforthemodel,thereisnomanagementofsessionhistoryorcontinuouscontext.(i.e.chathistory)

Managedsessionhistory(i.e.chathistory)to

allowformultiturninference/predictionbasedonuserqueriesanddecisionsmadeinthe

orchestrationlayer.Inthiscontext,a‘turn’is

definedasaninteractionbetweentheinteractingsystemandtheagent.(i.e.1incomingevent/

queryand1agentresponse)

Nonativetoolimplementation.

Toolsarenativelyimplementedinagentarchitecture.

Nonativelogiclayerimplemented.Userscanformpromptsassimplequestionsoruse

reasoningframeworks(CoT,ReAct,etc.)toformcomplexpromptstoguidethemodelin

prediction.

NativecognitivearchitecturethatusesreasoningframeworkslikeCoT,ReAct,orotherpre-built

agentframeworkslikeLangChain.

Cognitivearchitectures:Howagentsoperate

Imagineachefinabusykitchen.Theirgoalistocreatedeliciousdishesforrestaurantpatronswhichinvolvessomecycleofplanning,execution,andadjustment.

September20248

Agents

•Theygatherinformation,likethepatron’sorderandwhatingredientsareinthepantryandrefrigerator.

•Theyperformsomeinternalreasoningaboutwhatdishesandflavorprofilestheycancreatebasedontheinformationtheyhavejustgathered.

•Theytakeactiontocreatethedish:choppingvegetables,blendingspices,searingmeat.

Ateachstageintheprocessthechefmakesadjustmentsasneeded,refiningtheirplanasingredientsaredepletedorcustomerfeedbackisreceived,andusesthesetofprevious

outcomestodeterminethenextplanofaction.Thiscycleofinformationintake,planning,executing,andadjustingdescribesauniquecognitivearchitecturethatthechefemploystoreachtheirgoal.

Justlikethechef,agentscanusecognitivearchitecturestoreachtheirendgoalsby

iterativelyprocessinginformation,makinginformeddecisions,andrefiningnextactions

basedonpreviousoutputs.Atthecoreofagentcognitivearchitecturesliestheorchestrationlayer,responsibleformaintainingmemory,state,reasoningandplanning.Itusestherapidlyevolvingfieldofpromptengineeringandassociatedframeworkstoguidereasoningand

planning,enablingtheagenttointeractmoreeffectivelywithitsenvironmentandcompletetasks.Researchintheareaofpromptengineeringframeworksandtaskplanningfor

languagemodelsisrapidlyevolving,yieldingavarietyofpromisingapproaches.Whilenotanexhaustivelist,theseareafewofthemostpopularframeworksandreasoningtechniquesavailableatthetimeofthispublication:

•ReAct,apromptengineeringframeworkthatprovidesathoughtprocessstrategyfor

languagemodelstoReasonandtakeactiononauserquery,withorwithoutin-context

examples.ReActpromptinghasshowntooutperformseveralSOTAbaselinesandimprovehumaninteroperabilityandtrustworthinessofLLMs.

September20249

Agents

•Chain-of-Thought(CoT),apromptengineeringframeworkthatenablesreasoning

capabilitiesthroughintermediatesteps.Therearevarioussub-techniquesofCoTincludingself-consistency,active-prompt,andmultimodalCoTthateachhavestrengthsand

weaknessesdependingonthespecificapplication.

•Tree-of-thoughts(ToT),,apromptengineeringframeworkthatiswellsuitedfor

explorationorstrategiclookaheadtasks.Itgeneralizesoverchain-of-thoughtpromptingandallowsthemodeltoexplorevariousthoughtchainsthatserveasintermediatestepsforgeneralproblemsolvingwithlanguagemodels.

Agentscanutilizeoneoftheabovereasoningtechniques,ormanyothertechniques,to

choosethenextbestactionforthegivenuserrequest.Forexample,let’sconsideranagentthatisprogrammedtousetheReActframeworktochoosethecorrectactionsandtoolsfortheuserquery.Thesequenceofeventsmightgosomethinglikethis:

1.Usersendsquerytotheagent

2.AgentbeginstheReActsequence

3.Theagentprovidesaprompttothemodel,askingittogenerateoneofthenextReActstepsanditscorrespondingoutput:

a.Question:Theinputquestionfromtheuserquery,providedwiththeprompt

b.Thought:Themodel’sthoughtsaboutwhatitshoulddonext

c.Action:Themodel’sdecisiononwhatactiontotakenexti.Thisiswheretoolchoicecanoccur

ii.Forexample,anactioncouldbeoneof[Flights,Search,Code,None],wherethefirst3representaknowntoolthatthemodelcanchoose,andthelastrepresents“no

toolchoice”

September202410

Agents

d.Actioninput:Themodel’sdecisiononwhatinputstoprovidetothetool(ifany)e.Observation:Theresultoftheaction/actioninputsequence

i.Thisthought/action/actioninput/observationcouldrepeatN-timesasneededf.Finalanswer:Themodel’sfinalanswertoprovidetotheoriginaluserquery

4.TheReActloopconcludesandafinalanswerisprovidedbacktotheuser

Figure2.ExampleagentwithReActreasoningintheorchestrationlayer

AsshowninFigure2,themodel,tools,andagentconfigurationworktogethertoprovideagrounded,conciseresponsebacktotheuserbasedontheuser’soriginalquery.Whilethemodelcouldhaveguessedatananswer(hallucinated)basedonitspriorknowledge,itinsteadusedatool(Flights)tosearchforreal-timeexternalinformation.Thisadditional

informationwasprovidedtothemodel,allowingittomakeamoreinformeddecisionbasedonrealfactualdataandtosummarizethisinformationbacktotheuser.

September202411

Agents

Insummary,thequalityofagentresponsescanbetieddirectlytothemodel’sabilityto

reasonandactaboutthesevarioustasks,includingtheabilitytoselecttherighttools,andhowwellthattoolshasbeendefined.Likeachefcraftingadishwithfreshingredientsandattentivetocustomerfeedback,agentsrelyonsoundreasoningandreliableinformationtodeliveroptimalresults.Inthenextsection,we’lldiveintothevariouswaysagentsconnectwithfreshdata.

Tools:Ourkeystotheoutsideworld

Whilelanguagemodelsexcelatprocessinginformation,theylacktheabilitytodirectly

perceiveandinfluencetherealworld.Thislimitstheirusefulnessinsituationsrequiring

interactionwithexternalsystemsordata.Thismeansthat,inasense,alanguagemodel

isonlyasgoodaswhatithaslearnedfromitstrainingdata.Butregardlessofhowmuch

datawethrowatamodel,theystilllackthefundamentalabilitytointeractwiththeoutsideworld.Sohowcanweempowerourmodelstohavereal-time,context-awareinteractionwithexternalsystems?Functions,Extensions,DataStoresandPluginsareallwaystoprovidethiscriticalcapabilitytothemodel.

Whiletheygobymanynames,toolsarewhatcreatealinkbetweenourfoundationalmodelsandtheoutsideworld.Thislinktoexternalsystemsanddataallowsouragenttoperformawidervarietyoftasksanddosowithmoreaccuracyandreliability.Forinstance,toolscanenableagentstoadjustsmarthomesettings,updatecalendars,fetchuserinformationfromadatabase,orsendemailsbasedonaspecificsetofinstructions.

Asofthedateofthispublication,therearethreeprimarytooltypesthatGooglemodelsareabletointeractwith:Extensions,Functions,andDataStores.Byequippingagentswithtools,weunlockavastpotentialforthemtonotonlyunderstandtheworldbutalsoactuponit,

openingdoorstoamyriadofnewapplicationsandpossibilities.

September202412

Agents

Extensions

TheeasiestwaytounderstandExtensionsistothinkofthemasbridgingthegapbetween

anAPIandanagentinastandardizedway,allowingagentstoseamlesslyexecuteAPIs

regardlessoftheirunderlyingimplementation.Let’ssaythatyou’vebuiltanagentwithagoalofhelpingusersbookflights.YouknowthatyouwanttousetheGoogleFlightsAPItoretrieveflightinformation,butyou’renotsurehowyou’regoingtogetyouragenttomakecallstothisAPIendpoint.

Figure3.HowdoAgentsinteractwithExternalAPIs?

Oneapproachcouldbetoimplementcustomcodethatwouldtaketheincominguserquery,parsethequeryforrelevantinformation,thenmaketheAPIcall.Forexample,inaflight

bookingusecaseausermightstate“IwanttobookaflightfromAustintoZurich.”Inthis

scenario,ourcustomcodesolutionwouldneedtoextract“Austin”and“Zurich”asrelevantentitiesfromtheuserquerybeforeattemptingtomaketheAPIcall.Butwhathappensiftheusersays“IwanttobookaflighttoZurich”andneverprovidesadeparturecity?TheAPIcall

wouldfailwithouttherequireddataandmorecodewouldneedtobeimplementedinordertocatchedgeandcornercaseslikethis.Thisapproachisnotscalableandcouldeasilybreakinanyscenariothatfallsoutsideoftheimplementedcustomcode.

September202413

Agents

AmoreresilientapproachwouldbetouseanExtension.AnExtensionbridgesthegapbetweenanagentandanAPIby:

1.TeachingtheagenthowtousetheAPIendpointusingexamples.

2.TeachingtheagentwhatargumentsorparametersareneededtosuccessfullycalltheAPIendpoint.

Figure4.ExtensionsconnectAgentstoExternalAPIs

Extensionscanbecraftedindependentlyoftheagent,butshouldbeprovidedaspartoftheagent’sconfiguration.TheagentusesthemodelandexamplesatruntimetodecidewhichExtension,ifany,wouldbesuitableforsolvingtheuser’squery.ThishighlightsakeystrengthofExtensions,theirbuilt-inexampletypes,thatallowtheagenttodynamicallyselectthe

mostappropriateExtensionforthetask.

Figure5.1-to-manyrelationshipbetweenAgents,ExtensionsandAPIs

September202414

Agents

ThinkofthisthesamewaythatasoftwaredeveloperdecideswhichAPIendpointstousewhilesolvingandsolutioningforauser’sproblem.Iftheuserwantstobookaflight,the

developermightusetheGoogleFlightsAPI.Iftheuserwantstoknowwherethenearest

coffeeshopisrelativetotheirlocation,thedevelopermightusetheGoogleMapsAPI.In

thissameway,theagent/modelstackusesasetofknownExtensionstodecidewhichonewillbethebestfitfortheuser’squery.Ifyou’dliketoseeExtensionsinaction,youcantrythemoutontheGeminiapplicationbygoingtoSettings>Extensionsandthenenablinganyyouwouldliketotest.Forexample,youcouldenabletheGoogleFlightsextensionthenaskGemini“ShowmeflightsfromAustintoZurichleavingnextFriday.”

SampleExtensions

TosimplifytheusageofExtensions,Googleprovidessomeoutoftheboxextensionsthat

canbequicklyimportedintoyourprojectandusedwithminimalconfigurations.Forexample,theCodeInterpreterextensioninSnippet1allowsyoutogenerateandrunPythoncodefromanaturallanguagedescription.

September202415

Agents

Python

importvertexaiimportpprint

PROJECT_ID="YOUR_PROJECT_ID"REGION="us-central1"

vertexai.init(project=PROJECT_ID,location=REGION)

fromvertexai.preview.extensionsimportExtension

extension_code_interpreter=Extension.from_hub("code_interpreter")

CODE_QUERY="""WriteapythonmethodtoinvertabinarytreeinO(n)time."""

response=extension_code_interpreter.execute(

operation_id="generate_and_execute",

operation_params={"query":CODE_QUERY})

print("GeneratedCode:")

pprint.pprint({response['generated_code']})

#Theabovesnippetwillgeneratethefollowingcode.

```

GeneratedCode:classTreeNode:

definit(self,val=0,left=None,right=None):self.val=val

self.left=left

self.right=right

Continuesnextpage...

September202416

Agents

Python

definvert_binary_tree(root):

"""

Invertsabinarytree.Args:

root:Therootofthebinarytree.Returns:

Therootoftheinvertedbinarytree.

"""

ifnotroot:

returnNone

#Swaptheleftandrightchildrenrecursively

root.left,root.right=

invert_binary_tree(root.right),invert_binary_tree(root.left)

returnroot

#Exampleusage:

#Constructasamplebinarytree

root=TreeNode(4)

root.left=TreeNode(2)root.right=TreeNode(7)

root.left.left=TreeNode(1)

root.left.right=TreeNode(3)root.right.left=TreeNode(6)root.right.right=TreeNode(9)

#Invertthebinarytree

inverted_root=invert_binary_tree(root)```

Snippet1.CodeInterpreterExtensioncangenerateandrunPythoncode

September202417

Agents

Tosummarize,Extensionsprovideawayforagentstoperceive,interact,andinfluencetheoutsideworldinamyriadofways.TheselectionandinvocationoftheseExtensionsisguidedbytheuseofExamples,allofwhicharedefinedaspartoftheExtensionconfiguration.

Functions

Intheworldofsoftwareengineering,functionsaredefinedasself-containedmodules

ofcodethataccomplishaspecifictaskandcanbereusedasneeded.Whenasoftwaredeveloperiswritingaprogram,theywilloftencreatemanyfunctionstodovarioustasks.Theywillalsodefinethelogicforwhentocallfunction_aversusfunction_b,aswellastheexpectedinputsandoutputs.

Functionsworkverysimilarlyintheworldofagents,butwecanreplacethesoftware

developerwithamodel.AmodelcantakeasetofknownfunctionsanddecidewhentouseeachFunctionandwhatargumentstheFunctionneedsbasedonitsspecification.FunctionsdifferfromExtensionsinafewways,mostnotably:

1.AmodeloutputsaFunctionanditsarguments,butdoesn’tmakealiveAPIcall.

2.Functionsareexecutedontheclient-side,whileExtensionsareexecutedontheagent-side.

UsingourGoogleFlightsexampleagain,asimplesetupforfunctionsmightlookliketheexampleinFigure7.

September202418

Agents

Figure7.HowdofunctionsinteractwithexternalAPIs?

NotethatthemaindifferencehereisthatneithertheFunctionnortheagentinteractdirectlywiththeGoogleFlightsAPI.SohowdoestheAPIcallactuallyhappen?

Withfunctions,thelogicandexecutionofcallingtheactualAPIendpointisoffloadedawayfromtheagentandbacktotheclient-sideapplicationasseeninFigure8andFigure9below.Thisoffersthedevelopermoregranularcontrolovertheflowofdataintheapplication.There

aremanyreasonswhyaDevelopermightchoosetousefunctionsoverExtensions,butafewcommonusecasesare:

•APIcallsneedtobemadeatanotherlayeroftheapplicationstack,outsideofthedirectagentarchitectureflow(e.g.amiddlewaresystem,afrontendframework,etc.)

•SecurityorAuthenticationrestrictionsthatpreventtheagentfromcallinganAPIdirectly(e.gAPIisnotexposedtotheinternet,ornon-accessiblebyagentinfrastructure)

•Timingororder-of-operationsconstraintsthatpreventtheagentfrommakingAPIcallsinreal-time.(i.e.batchoperations,human-in-the-loopreview,etc.)

September202419

Agents

•AdditionaldatatransformationlogicneedstobeappliedtotheAPIResponsethatthe

agentcannotperform.Forexample,consideranAPIendpointthatdoesn’tprovidea

filteringmechanismforlimitingthenumberofresultsreturned.UsingFunctionsonthe

client-sideprovidesthedeveloperadditionalopportunitiestomakethesetransformations.

•Thedeveloperwantstoiterateonagentdevelopmentwithoutdeployingadditional

infrastructurefortheAPIendpoints(i.e.FunctionCallingcanactlike“stubbing”ofAPIs)

WhilethedifferenceininternalarchitecturebetweenthetwoapproachesissubtleasseeninFigure8,theadditionalcontrolanddecoupleddependencyonexternalinfrastructuremakesFunctionCallinganappealingoptionfortheDeveloper.

Figure8

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论