




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
毛航宇快手科技快意大模型知识增强研发负责人现就职于快手科技,快意大模型知识增强研发负责人,同时兼任智能交互团队负责人。主要关注Agent,RAG,Alignment,RL,LLM等技术,在ICLR及NeurIPS,ICML等CCF-A/B类会议和期刊上发表论文30余篇,申请国际、国内专利十余项,相关研究在企业场景落地并产生较大效益。曾担任上述国际会议的PC,SeniorPC,AreaChair,中国数据挖掘会议(CCDM)的论坛主席,以及CCF多智能体学组的执行委员。本人和所带领的团队曾获全球数字经济大会“人工智能大模型场景应用典型案例”、国际人工智能会议NeurIPS强化学习竞赛冠军、中国计算机学会“多智能体研究优秀博士论文奖”、北京市“优秀(博士)毕业生”、华为“创新先锋总裁奖”。演讲主题:从强化学习(多)智能体到大语言模型(多)智能体22.强化学习(多)智能体到大语言模型(多)智能体代表工作选讲https://lilianweng.github.io/posts/2023-06-23-agent/LLM–AIAgentLLM–AIAgents16-1:AlphaGoCommunicationCommNet/BiCNet/ACCNetATOC/IC3Net/Gated-ACMLTransformerC51/QR-DQNEvolutionStrategy17:MADDPG/19:ATT-MADDPG18:VDN/QMIXGrouping/Role/Graph/AttentionCognitionConsistency(NCC-MARL)PermutationInvariant/Equivalent22-5:GeneralistAgentPromptTuning22:BootstrapTran(BooT)23:TIT/PDiTLlama/Llama-2GPT-3.5/GPT-423-3-23:ChatGPTplugins(OpenAI)23-6-23:LLMPoweredAgents(Lil23-8-7:TPTU23-8-22:SurveyfromReminUniversity23-9-14:SurveyfromFudanUniversity23-11-19:TPTU-2DS-Agent;Sheet/SQLAgent;ToolGen23:GenerativeAgents(斯坦福小镇)23:RecAgent/EconAgent23:ChatDev/ChatEval23:AgentGen/AgentVerse24:LLMAgentOperatingSystem24:AutomatedDesignofAgenticO1LLM–AIAgentLLM–AIAgents16-1:AlphaGoCommunicationCommNet/BiCNet/ACCNetATOC/IC3Net/Gated-ACMLTransformerC51/QR-DQNEvolutionStrategy17:MADDPG/19:ATT-MADDPG18:VDN/QMIXGrouping/Role/Graph/AttentionCognitionConsistency(NCC-MARL)PermutationInvariant/Equivalent22-5:GeneralistAgentPromptTuning22:BootstrapTran(BooT)23:TIT/PDiTLlama/Llama-2GPT-3.5/GPT-423-3-23:ChatGPTplugins(OpenAI)23-6-23:LLMPoweredAgents(Lil23-8-7:TPTU23-8-22:SurveyfromReminUniversity23-9-14:SurveyfromFudanUniversity23-11-19:TPTU-2DS-Agent;Sheet/SQLAgent;ToolGen23:GenerativeAgents(斯坦福小镇)23:RecAgent/EconAgent23:ChatDev/ChatEval23:AgentGen/AgentVerse24:LLMAgentOperatingSystem24:AutomatedDesignofAgenticSTEERO1TRLAgents应用范式不需要重新训练仅需微调(提示词)状态空间小小相同点8TPTU-2EMNLP/ICLR24-LLMAgentAAAI20AAMAS24RL已经能解决Atari等单技能任务智能体有在开放环境中的持续学习能力吗?MineCraft成为天然“演练场”Mao,Hangyu,etal."Seihai:Asample-efficienthierarchicalaifortheminerlcompetition."DistributedArtificialIntelligence:ThirdInternationalConference,DAI2021.Guss,WilliamHebgen,etal."Towardsrobustanddomainagnosticreinforcementlearningcompetitions:MineRL2020."NeurIPS2020CompetitionandDemonstrationTrack.PMLR,2021.Mao,Hangyu,etal."Seihai:Asample-efficienthierarchicalaifortheminerlcompetition."DistributedArtificialIntelligence:ThirdInternationalConference,DAI2021.Guss,WilliamHebgen,etal."Towardsrobustanddomainagnosticreinforcementlearningcompetitions:MineRL2020."NeurIPS2020CompetitionandDemonstrationTrack.PMLR,2021.trainingtheschedulerboilsdowntoaclassificationtaskTPTU-2AAAI20AAMAS24Gated-ACMLMAcommunication是很经典的研究课题,但实际问题中通信带宽有限Mao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.Gated-ACML如何设置T==》动态(如下图)如何设置T==》动态(如下图)和静态(?)messagepruning转化为binaMao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.Gated-ACMLMao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.Chen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.已有工作:在框架层面上思考CooperativeMARL可能的形式Chen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.18已有工作:在框架层面上思考CooperativeMARL可能的形式①关注centralized①关注centralizedinformation对于actor的影响②关注如何personalizedthesamecentralizedinformationOverallFrameworkChen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.19已有工作:在框架层面上思考CooperativeMARL可能的形式③知识蒸馏:保证去中心化执行③知识蒸馏:保证去中心化执行关注state向observation蒸馏,而不是模型压缩①关注centralizedinformation对于actor的影响②关注如何personalizedthesamecentralizedinformationOverallFrameworkChen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.20Chen,Yiqun,MaoHangyu,etal."Ptde:Personalizedtrainingwithdistilledexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.SeePaperforMoreResultswithVDN/MAPPOonGoogleResearchFootballandLearning2RankTasks怎么改进主流的MADDPG方法?更核心的思考:如何建模不断变化的队友策略,并进一步更新自己的策略?Mao,Hangyu,etal."ModellingtheDynamicJointPolicyofTeammateswithAttentionMulti-agentDDPG."Proceedingsofthe18thInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2019.Multi-agent怎么才能像人一样很好的合作?人在合作时有什么特性?Mao,Hangyu,etal."Neighborhoodcognitionconsistentmulti-agentreinforcementlearning."ProceedingsoftheAAAIconferenceonartificialintelligence.Vol.34.No.05.2020.如何利用多智能体系统的特性Hao,Xiaotian,MaoHangyu,etal."Breakingthecurseofdimensionalityinmultiagentstatespace:Aunifiedagentpermutationframework."arXivpreprintarXiv:2203.05285(2022).Hao,Jianye,HaoXiaotian,MaoHangyu,etal."Boostingmultiagentreinforcementlearningviapermutationinvariantandpermutationequivariantnetworks."TheEleventhICLR.2023.关于MARL/Game更详细梳理TPTU-2AAAI20AAMAS24Transformer结构如何适配RL?Perception&DecisionMao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."The23rdInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2024.Mao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."The23rdInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2024.Mao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."The23rdInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2024.效果好:相同算法下好过DT/GATO/StARformerMao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."The23rdInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2024.TPTU-2AAAI20AAMAS24Transformer结构如何适配MARL?和异步决策完全匹配!Zhang,Bin,HangyuMao,etal."Stackelbergdecisiontransformerforasynchronousactioncoordinationinmulti-agentsystems."arXivpreprintarXiv:2305.07856(2023).Zhang,Bin,HangyuMao,etal."SequentialAsynchronousActionCoordinationinMulti-AgentSystems:AStackelbergDecisionTransformerApproach."ICML2024.Zhang,Bin,HangyuMao,etal."Stackelbergdecisiontransformerforasynchronousactioncoordinationinmulti-agentsystems."arXivpreprintarXiv:2305.07856(2023).Zhang,Bin,HangyuMao,etal."SequentialAsynchronousActionCoordinationinMulti-AgentSystems:AStackelbergDecisionTransformerApproach."ICML2024.效果好:一定条件下能收敛到StackelbergEquilibrium(TPTU-2AAAI20AAMAS24LLM已经初步具备通识能力,可以认为是通用的“世界模型”RL中的actionspace/externalenvironmentToolUseRuan,Jingqing,etal."TPTU:TaskPlanningandToolUsageofLargeLanguageModel-basedAIAgents."NeurIPS2023FoundationModelsforDecisionMakingWorkshop.2023.Kong,Yilun,etal."Tptu-v2:Boostingtaskplanningandtoolusageoflargelanguagemodel-basedagentsinreal-worldsystems."ICLR2024LLMAgentWorkshop.2024.Ruan,Jingqing,etal."TPTU:TaskPlanningandToolUsageofLargeLanguageModel-basedAIAgents."NeurIPS2023FoundationModelsforDecisionMakingWorkshop.2023.Kong,Yilun,etal."Tptu-v2:Boostingtaskplanningandtoolusageoflargelanguagemodel-basedagentsinreal-worldsystems."ICLR2024LLMAgentWorkshop.2024.3.2EvaluationonTaskPlanningAbility3.3EvaluationonToolUsageAbilityTools3.4InsightfulObservationsTPTU-2LLM已经初步具备通识能力,可以认为是通用的“世界模型”LLM如何赋能智能体从RL角度看)关键点在什么地方?RL中的long-term/multi-stepdeciTPTU-2TPTU-2Sincetheubiquityandsatisfactoryperformanceofexistingfine-tuningmethods,suchasSFT,LoRA,andQLoRA,weshiftour猜猜DemoRetriever细节?TPTU-2TPTU-2AAAI20AAMAS24LLM已经初步具备通识能力,可以认为是通用的“世界模型”LLM如何赋能多(large-scale)智能体从MARL角度看)关键点在什么地方?Zhang,Bin,HangyuMao,etal."Controllinglargelanguagemodel-basedagentsforlarge-scaledecision-making:Anactor-criticapproach."ICLR2024LLMAgentWorkshop.2024.Zhang,Bin,HangyuMao,etal."Controllinglargelanguagemodel-basedagentsforlarge-scaledecision-making:Anactor-criticapproach."ICLR2024LLMAgentWorkshop.2024.stationary。2.通过accessor来平衡探索和来的问题。Feedback,使信息更加精简准确,减少迭代次数,进而减少token数量。2.强化学习(多)智能体到大语言模型(多)智能体代表工作选讲技术发展方向ATT-MADDPG,AAMAS19AAMAS24TPTU-2技术发展方向AAMAS24TPTU-2X-X-Light/CoSLightATT-MADDPG,AAMAS19技术发展方向AAMAS24TPTU-2AAMAS24TPTU-2X-Light/CoSLightATT-MADDPG,AAMAS19严肃商业场景,提升效率为主(ToC/B/G)新闻报道诉状生成Copilot新闻报道诉状生成Copilot代码小浣熊SQLBench/PET-SQL个人助手(请假)TPTU/TPTU-2刷单/恶评识别Character.AI泛娱乐场景,情感共鸣为主(ToC)几乎闭集几乎闭集幻觉好控价值好衡量新闻报道诉状生成幻觉难控制,合规安全难保障Copilot代码小浣熊短代码价值不好SQLBench/PET-SQL个人助手(请假)TPTU/TPTU-2幻觉好控制价值好衡量锦上添花型泛娱乐场景,情感共鸣为主(ToC)严肃商业场景,提升效率为主(ToC/B/G)Copilot代码小浣熊Copilot代码小浣熊SQLBench/PET-SQL刷单/恶评识别“AI“AI小快”创意生成Character.AI“AI小快”技术发展方向AAMAS24TPTU-2AAMAS24TPTU-2 X-Light/CoSLight ATT-MADDPG,AAMAS19代码TRL-AgentsTPTU-v2参考论文oursMao,Hangyu,etal."Seihai:Asample-efficienthierarchicalaifortheminerlcompetition."DistributedArtificialIntelligence:ThirdInternationalConference,DAI2021,Shanghai,China,December17–18,2021,Proceedings3.SpringerInternationalPublishing,2022.Mao,Hangyu,etal.“ACCNet:Actor-coordinator-criticnetfor”Learning-to-communicate“withdeepmulti-agentreinforcementlearning.”arXivpreprintarXiv:1706.03235(2017).Mao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.Mao,Hangyu,etal."ModellingtheDynamicJointPolicyofTeammateswithAttentionMulti-agentDDPG."Proceedingsofthe18thInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2019.Chen,Yiqun,etal."Ptde:Personalizedtrainingwithdistillatedexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.Mao,Hangyu,etal."Neighborhoodcognitionconsistentmulti-agentreinforcementlearning."ProceedingsoftheAAAIconferenceonartificialintelligence.Vol.34.No.05.2020.Jianye,H.A.O.,etal."Boostingmultiagentreinforcementlearningviapermutationinvariantandpermutationequivariantnetworks."TheEleventhInternationalConferenceonLearningRepresentations.2022.Zhang,Xianjie,etal."Structuralrelationalinferenceactor-criticformulti-agentreinforcementlearning."Neurocomputing459(2021):383-394.Mao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."arXivpreprintarXiv:2312.15863(2023).AcceptedbyAAMAS2024.Zhang,Bin,etal."Stackelbergdecisiontransformerforasynchronousactioncoordinationinmulti-agentsystems."arXivpreprintarXiv:2305.07856(2023).AcceptedbyICML2024.Zhang,Bin,etal.SequentialAsynchronousActionCoordinationinMulti-AgentSystems:AStackelbergDecisionTransformerApproach.AcceptedbyICML2024.Ruan,Jingqing,etal."TPTU:TaskPlanningandToolUsageofLargeLanguageModel-basedAIAgents."NeurIPS2023FoundationModelsforDecisionMakingWorkshop.2023.Kong,Yilun,etal."Tptu-v2:Boostingtaskplanningandtoolusageoflargelanguagemodel-basedagentsinreal-worldsystems."arXivpreprintarXiv:2311.11315.AcceptedbyICLR2024LLMAgentworkshop.Zhang,Bin,etal."Controllinglargelanguagemodel-basedagentsforlarge-scaledecision-making:Anactor-criticapproach."arXivpreprintarXiv:2311.13884.AcceptedbyICLR2024LLMAgentworkshop.思考总结:Zhang,Bin,etal."BenchmarkingtheText-to-SQLCapabilityofLargeLanguageModels:AComprehensiveEvaluation."arXivpreprintarXiv:2403.02951(2024).Li,Zhishuai,etal."PET-SQL:APrompt-enhancedTwo-stageText-to-SQLFrameworkwithCross-consistency."arXivpreprintarXiv:2403.09732(2024).Sui,Guanghu,etal."ReboostLargeLanguageModel-basedText-to-SQL,Text-to-Python,andText-to-Function--withRealApplicationsinTrafficDomain."arXivpreprintarXiv:2310.18752(2023).Jiang,Haoyuan,etal."Ageneralscenario-agnosticreinforcementlearningfortrafficsignalcontrol."IEEETransactionsonIntelligentTransportationSystems(2024).Lu,Jiaming,etal."DuaLight:EnhancingTrafficSignalControlbyLeveragingScenario-SpecificandScenario-SharedKnowledge."arXivpreprintarXiv:2312.14532(2023).AcceptedbyAAMAS2024.Jiang,Haoyuan,etal."X-Light:Cross-CityTrafficSignalControlUsingTransformeronTransformerasMetaMulti-AgentReinforcementLearner."arXivpreprintarXiv:2404.12090(2024).IJCAI24.Ruan,Jingqing,etal.CoSLight:Co-optimizingCollaboratorSelectionandDecision-makingtoEnhanceTrafficSignalControl.AcceptedbyKDD2024.Kong,Yilun,etal."QPO:Query-dependentPromptOptimizationviaMulti-LoopOfflineReinforcementLearning."arXivpreprintarXiv:2408.10504(2024).参考论文1DRLFoundation(2015-2017):[1]Mnih,Volodymyr,etal."Human-levelcontrolthroughdeepreinforcementlearning."nature518.7540(2015):529-533.[2]Schulman,John,etal."TrustRegionPolicyOptimization."arXivpreprintarXiv:1502.05477(2015).[3]Schulman,John,etal."High-dimensionalcontinuouscontrolusinggeneralizedadvantageestimation."arXivpreprintarXiv:1506.02438(2015).[4]Lillicrap,TimothyP.,etal."Continuouscontrolwithdeepreinforcementlearning."arXivpreprintarXiv:1509.02971(2015).[5]Silver,David,etal."MasteringthegameofGowithdeepneuralnetworksandtreesearch."nature529.7587(2016):484-489.[6]Schulman,John,etal."Proximalpolicyoptimizationalgorithms."arXivpreprintarXiv:1707.06347(2017).DRLin2018-2020:Rainbow:CombiningImprovementsinDeepReinforcementLearning,Hesseletal,2017.Algorithm:RainbowDQN.ADistributionalPerspectiveonReinforcementLearning,Bellemareetal,2017.Algorithm:C51.DistributionalReinforcementLearningwithQuantileRegression,Dabneyetal,2017.Algorithm:QR-DQN.EvolutionStrategiesasaScalableAlternativetoReinforcementLearning,Salimansetal,2017.Algorithm:ES.NeuralNetworkDynamicsforModel-BasedDeepReinforcementLearningwithModel-FreeFine-Tuning,Nagabandietal,2017.Algorithm:MBMF.(modelislearned)MasteringChessandShogibySelf-PlaywithaGeneralReinforcementLearningAlgorithm,Silveretal,2017.Algorithm:AlphaZero.(modelisgiven)IMPALA:ScalableDistributedDeep-RLwithImportanceWeightedActor-LearnerArchitectures,Espeholtetal,2018.Algorithm:IMPALA.Data-EfficientHierarchicalReinforcementLearning,Nachumetal,2018.Algorithm:HIRO.Mao,Hangyu,etal."Seihai:Asample-efficienthierarchicalaifortheminerlcompetition."DistributedArtificialIntelligence:ThirdInternationalConference,DAI2021,Shanghai,China,December17–18,2021,Proceedings3.SpringerInternationalPublishing,2022.Fu,Justin,etal."D4rl:Datasetsfordeepdata-drivenreinforcementlearning."arXivpreprintarXiv:2004.07219(2020).参考论文2MARLCommunication:Mao,Hangyu,etal.“ACCNet:Actor-coordinator-criticnetfor”Learning-to-communicate“withdeepmulti-agentreinforcementlearning.”arXivpreprintarXiv:1706.03235(2017).Mao,Hangyu,etal."Learningagentcommunicationunderlimitedbandwidthbymessagepruning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.34.No.04.2020.MARLCTDE:Lowe,Ryan,etal."Multi-agentactor-criticformixedcooperative-competitiveenvironments."Advancesinneuralinformationprocessingsystems30(2017).Mao,Hangyu,etal."ModellingtheDynamicJointPolicyofTeammateswithAttentionMulti-agentDDPG."Proceedingsofthe18thInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2019.Sunehag,Peter,etal."Value-DecompositionNetworksForCooperativeMulti-AgentLearningBasedOnTeamReward."Proceedingsofthe17thInternationalConferenceonAutonomousAgentsandMultiAgentSystems.2018.Rashid,Tabish,etal."QMIX:MonotonicValueFunctionFactorisationforDeepMulti-AgentReinforcementLearning."InternationalConferenceonMachineLearning.PMLR,2018.Yu,Chao,etal."Thesurprisingeffectivenessofppoincooperativemulti-agentgames."AdvancesinNeuralInformationProcessingSystems35(2022):24611-24624.Chen,Yiqun,etal."Ptde:Personalizedtrainingwithdistillatedexecutionformulti-agentreinforcementlearning."arXivpreprintarXiv:2210.08872(2022).AcceptedbyIJCAI2024.MARLin2020-2021:Mao,Hangyu,etal."Neighborhoodcognitionconsistentmulti-agentreinforcementlearning."ProceedingsoftheAAAIconferenceonartificialintelligence.Vol.34.No.05.2020.Jianye,H.A.O.,etal."Boostingmultiagentreinforcementlearningviapermutationinvariantandpermutationequivariantnetworks."TheEleventhInternationalConferenceonLearningRepresentations.2022.Zhang,Xianjie,etal."Structuralrelationalinferenceactor-criticformulti-agentreinforcementlearning."Neurocomputing459(2021):383-394.参考论文3Transformer-basedRL(TRL)Foundation(2021-2022):Chen,Lili,etal."Decisiontransformer:Reinforcementlearningviasequencemodeling."Advancesinneuralinformationprocessingsystems34(2021):15084-15097.Janner,Michael,QiyangLi,andSergeyLevine."Offlinereinforcementlearningasonebigsequencemodelingproblem."Advancesinneuralinformationprocessingsystems34(2021):1273-1286.Reed,Scott,etal."Ageneralistagent."arXivpreprintarXiv:2205.06175(2022).Brohan,Anthony,etal."Rt-1:Roboticstransformerforreal-worldcontrolatscale."arXivpreprintarXiv:2212.06817(2022).TRLin2023-2024:Mao,Hangyu,etal."Transformerintransformerasbackbonefordeepreinforcementlearning."arXivpreprintarXiv:2212.14538(2022).Mao,Hangyu,etal."PDiT:InterleavingPerceptionandDecision-makingTransformersforDeepReinforcementLearning."arXivpreprintarXiv:2312.15863(2023).AcceptedbyAAMAS2024.Siebenborn,Max,etal."Howcrucialistransformerindecisiontransformer?."arXivpreprintarXiv:2211.14655(2022).Lee,Kuang-Huei,etal."Multi-gamedecisiontransformers."AdvancesinNeuralInformationProcessingSystems35(2022):27921-27936.Paster,Keiran,SheilaMcIlraith,andJimmyBa."Youcan’tcountonluck:Whydecisiontransformersandrvsfailinstochasticenvironments."Advancesinneuralinformationprocessingsystems35(2022):38966-38979.Wang,Kerong,etal."Bootstrappedtransformerforofflinereinforcementlearning."AdvancesinNeuralInformationProcessingSystems35(2022):34748-34761.Zheng,Qinqing,AmyZhang,andAdityaGrover."Onlinedecisiontransformer."internationalconferenceonmachinelearning.PMLR,2022.Xu,Mengdi,etal."Promptingdecisiontransformerforfew-shotpolicygeneralization."internationalconferenceonmachinelearning.PMLR,2022.Yamagata,Taku,AhmedKhalil,andRaulSantos-Rodriguez."Q-learningdecisiontransformer:Leveragingdynamicprogrammingforconditionalsequencemodellinginofflinerl."InternationalConferenceonMachineLearning.PMLR,2023.Hu,Shengchao,etal."Graphdecisiontransformer."arXivpreprintarXiv:2303.03747(2023).Ma,Yi,etal."RethinkingDecisionTransformerviaHierarchicalReinforcementLearning."arXivpreprintarXiv:2311.00267(2023).Wang,Yuanfu,etal."Critic-guideddecisiontransformerforofflinereinforcementlearning."ProceedingsoftheAAAIConferenceonArtificialIntelligence.Vol.38.No.14.2024.参考论文4Transformer-basedMARL:Wen,M.,Kuba,J.,Lin,R.,Zhang,W.,Wen,Y.,Wang,J.,&Yang,Y.(2022).Multi-agentreinforcementlearningisasequencemodelingproblem.AdvancesinNeuralInformationProcessingSystems,35,16509-16521.Meng,Linghui,etal."Offlinepre-trainedmulti-agentdecisiontransformer."MachineIntelligenceResearch20.2(2023):233-248.Zhang,Bin,etal."Stackelbergdecisiontransformerforasynchronousactioncoordinationinmulti-agentsystems."arXivpreprintarXiv:2305.07856(2023).AcceptedbyICML2024.SequentialAsynchronousActionCoordinationinMulti-AgentSystems:AStackelbergDecisionTransformerApproach.AcceptedbyICML2024.参考论文-NLPVaswani,Ashish,etal."Attentionisallyouneed."Advancesinneuralinformationprocessingsystems30(2017).Devlin,Jacob,etal."Bert:Pre-trainingofdeepbidirectionaltransformersforlanguageunderstanding."arXivpreprintarXiv:1810.04805(2018).Radford,Alec,etal."Languagemodelsareunsupervisedmultitasklearners."OpenAIblog1.8(2019):9.Brown,Tom,etal."Languagemodelsarefew-shotlearners."Advancesinneuralinformationprocessingsystems33(2020):1877-1901.202101-Prefix-Tuning:OptimizingContinuousPromptsforGeneration202104-ThePowerofScaleforParameter-EfficientPromptTuning202110-P-Tuningv2:PromptTuningCanBeComparabletoFine-tuningUniversallyAcrossScalesandTasksWei,Jason,etal."Chain-of-ThoughtPromptingElicitsReasoninginLargeLanguageModels."arXivpreprintarXiv:2201.11903(2022).NeurIPS22.Ouyang,Long,etal."Traininglanguagemodelstofollowinstructionswithhumanfeedback."Advancesinneuralinformationprocessingsystems35(2022):27730-27744./index/chatgpt/ByJanuary2023,ithadbecomewhatwasthenthefastest-growingconsumersoftwareapplicationinhistory,gainingover100millionusersTouvron,Hugo,etal."Llama:Openandefficientfoundationlanguagemodels."arXivpreprintarXiv:2302.13971(2023).Touvron,Hugo,etal."Llama2:Openfoundationandfine-tunedchatmodels."arXivpreprintarXiv:2307.09288(2023)./meta-llama/llama3/blob/main/MODEL_CARD.md/index/gpt-4-research/Achiam,Josh,etal."Gpt-4technicalreport."arXivpreprintarXiv:2303.08774(2023).参考论文5&6NLP-Agent:/Significant-Gravitas/AutoGPT/yoheinakajima/babyagi/index/chatgpt-plugins/(We’realsohostingtwopluginsourselves,awebbrowserandcodeinterp
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 脚本写作合同范本
- 租赁合同范本与协议
- 跨区域人才交流与合作合同书
- 玉米出售合同范本
- 进销发票合同范本
- 基础拆改合同范本
- 和老板合作合同范本
- 受托支付合同范例个人
- 乳化沥青合同范例
- 三星工作室租房合同范例
- 2024北京八中初一(下)期中数学(教师版)
- 2024版《硫化氢培训课件》课件
- 垒墙施工合同范本
- 塔式太阳能光热发电站运行规程
- 五十六个民族之德昂族介绍
- 2024年苏州市职业大学单招职业适应性测试题库完整版
- 2024-2030年中国电子级氟化液行业应用状况与供需趋势预测研究报告
- 【特级教师上优课】《黄河颂》名师课件
- 2024年西藏初中学业水平考试生物试题(解析版)
- 新疆维吾尔自治区2023年道德与法治中考试卷(附参考答案)
- 预防和处理高尔夫球臂肌损伤
评论
0/150
提交评论