




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
HierarchicalMethodsfor
PlanningunderUncertaintyThesisProposalJoellePineauThesisCommittee:SebastianThrun,ChairMatthewMasonAndrewMooreCraigBoutilier,U.ofTorontoHierarchicalMethodsfor
PlannIntegratingrobotsinlivingenvironmentsTherobot’srole: -Socialinteraction -Mobilemanipulation -Intelligentreminding -Remote-operation -Datacollection/monitoringThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyIntegratingrobotsinlivingeAbroadperspectiveGOAL=SelectingappropriateactionsUSER+WORLD+ROBOTACTIONSOBSERVATIONSBeliefstateSTATEThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAbroadperspectiveGOAL=SeleCause#1:Non-deterministiceffectsofactionsCause#2:PartialandnoisysensorinformationCause#3:InaccuratemodeloftheworldandtheuserWhyisthisadifficultproblem?UNCERTAINTYThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyCause#1:Non-deterministiceCause#1:Non-deterministiceffectsofactionsCause#2:PartialandnoisysensorinformationCause#3:InaccuratemodeloftheworldandtheuserWhyisthisadifficultproblem?UNCERTAINTYAsolution:PartiallyObservableMarkovDecisionProcesses(POMDPs)S3o1,o2S1o1,o2S2o1,o2a1a2ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyCause#1:Non-deterministiceThetruthaboutPOMDPsBadnews:FindinganoptimalPOMDPactionselectionpolicyiscomputationallyintractableforcomplexproblems.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetruthaboutPOMDPsBadnewsThetruthaboutPOMDPsBadnews:FindinganoptimalPOMDPactionselectionpolicyiscomputationallyintractableforcomplexproblems.Goodnews:Manyreal-worlddecision-makingproblemsexhibitstructureinherenttotheproblemdomain.Byleveragingstructureintheproblemdomain,IproposeanalgorithmthatmakesPOMDPstractable,evenforlargedomains.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetruthaboutPOMDPsBadnewsHowisitdone?Usea“Divide-and-conquer”approach:Wedecomposealargemonolithicproblemintoacollectionofloosely-relatedsmallerproblems.DialoguemanagerHealthmanagerSocialmanagerRemindingmanagerThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyHowisitdone?Usea“Divide-ThesisstatementDecision-makingunderuncertaintycanbemadetractableforcomplexproblemsbyexploitinghierarchicalstructureintheproblemdomain.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThesisstatementDecision-makinOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarPOMDPswithinthefamilyofMarkovmodelsMarkovChainHiddenMarkovModel(HMM)MarkovDecisionProcess(MDP)PartiallyObservableMDP(POMDP)Uncertaintyinsensorinput?nonoControlproblem?yesyesThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPOMDPswithinthefamilyofMaPOMDPparameters:Initialbelief:b0(s)=Pr(so=s)Observationprobabilities:O(s,a,o)=Pr(o|s,a)Transitionprobabilities:T(s,a,s’)=Pr(s’|s,a)Rewards:R(s,a)HMM
WhatarePOMDPs?Components: Setofstates:sS Setofactions:aA Setofobservations:oO
0.50.51MDPS2Pr(o1)=0.9Pr(o2)=0.1S1Pr(o1)=0.5Pr(o2)=0.5a1a2S3Pr(o1)=0.2Pr(o2)=0.8ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPOMDPparameters:HMM WhatareAPOMDPexample:ThetigerproblemS1“tiger-left”Pr(o=growl-left)=0.85Pr(o=growl-right)=0.15S2“tiger-right”Pr(o=growl-left)=0.15Pr(o=growl-right)=0.85Actions={listen, open-left, open-right}RewardFunction: R(a=listen) =-1 R(a=open-right,s=tiger-left) =10 R(a=open-left,s=tiger-left) =-100ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAPOMDPexample:ThetigerproWhatcanwedowithPOMDPs?1)Statetracking:Afteranaction,whatisthestateoftheworld,st?2)Computingapolicy:Whichaction,aj,shouldthecontrollerapplynext?Veryhard!Notsohard.bt-1??at-1otRobot:St-1stWorld:Controllayer:......??ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyWhatcanwedowithPOMDPs?VerThetigerproblem:StatetrackingS1“tiger-left”S2“tiger-right”Beliefvectorb0BeliefThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackThetigerproblem:StatetrackingS1“tiger-left”S2“tiger-right”Beliefvectorb0Beliefobs=growl-leftaction=listenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackThetigerproblem:Statetrackingb1obs=growl-leftS1“tiger-left”S2“tiger-right”BeliefvectorBeliefb0action=listenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackPolicyOptimizationWhichaction,aj,shouldthecontrollerapplynext?InMDPs:Policyisamappingfromstatetoaction,:siajInPOMDPs:Policyisamappingfrombelieftoaction,:bajRecursivelycalculateexpectedlong-termrewardforeachstate/belief:Findtheactionthatmaximizestheexpectedreward:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPolicyOptimizationWhichactioThetigerproblem:OptimalpolicyBeliefvector:open-leftopen-rightlistenS1“tiger-left”S2“tiger-right”Optimalpolicy:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:OptimalpolFinite-horizonPOMDPsareinworse-casedoublyexponential:Infinite-horizonundiscountedstochasticPOMDPsareEXPTIME-hard,andmaynotbedecidable(|n|).ComplexityofpolicyoptimizationThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyComplexityofpolicyoptimizatTheessenceoftheproblemHowcanwefindgoodpoliciesforcomplexPOMDPs?Isthereaprincipledwaytoprovidenear-optimalpoliciesinreasonabletime?ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyTheessenceoftheproblemThesOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarAhierarchicalapproachtoPOMDPplanningKeyIdea:Exploithierarchicalstructureintheproblemdomaintobreakaproblemintomany“related”POMDPs.Whattypeofstructure? ActionsetpartitioningActInvestigateHealthMoveNavigateCheckPulseAskWhereLeftRightForwardBackwardCheckMedssubtaskabstractactionThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAhierarchicalapproachtoPOMAssumptionsEachPOMDPcontrollerhasasubsetofAo.EachPOMDPcontrollerhasfullstatesetS0,observationsetO0.Eachcontrollerincludesdiscriminativerewardinformation.Wearegiventheactionsetpartitioninggraph.WearegivenafullPOMDPmodeloftheproblem:{So,Ao,Oo,Mo}.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAssumptionsThesisProposal:HiThetigerproblem:AnactionhierarchyPinvestigate={S0,Ainvestigate,O0,Minvestigate}Ainvestigate={listen,open-right}actopen-leftinvestigateopen-rightlistenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:AnactionhOptimizingthe“investigate”controllerS1“tiger-left”S2“tiger-right”Locallyoptimalpolicy:Beliefvector:open-rightlistenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOptimizingthe“investigate”cThetigerproblem:AnactionhierarchyPact={S0,Aact,O0,Mact}Aact={open-left,investigate}actopen-leftinvestigateopen-rightlistenBut...R(s,a=investigate) isnotdefined!ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:AnactionhModelingabstractactionsInsight:Usethelocalpolicyofcorrespondinglow-levelcontroller.Generalform:R(si,ak)=R(si,Policy(controllerk,si))Example:R(s=tiger-left,ak=investigate)=
open-right listen open-lefttiger-left 10 -1 -100
tiger-right -100 -1 10
Policy
(investigate,s=tiger-left)=open-rightThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyModelingabstractactionsInsigOptimizingthe“act”controllerS1“tiger-left”S2“tiger-right”Locallyoptimalpolicy:investigateBeliefvector:open-leftThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOptimizingthe“act”controlleThecompletehierarchicalpolicyS1“tiger-left”S2“tiger-right”Hierarchicalpolicy:Beliefvector:open-leftopen-rightlistenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThecompletehierarchicalpoliThecompletehierarchicalpolicyS1“tiger-left”S2“tiger-right”Hierarchicalpolicy:open-leftopen-rightlistenOptimalpolicy:Beliefvector:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThecompletehierarchicalpoliResultsforlargersimulationdomainsThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResultsforlargersimulationRelatedworkonhierarchicalmethodsHierarchicalHMMsFineetal.,2019HierarchicalMDPsDayan&Hinton,1993;Dietterich,2019;McGovernetal.,2019;Parr&Russell,2019;Singh,1992.Loosely-coupledMDPsBoutilieretal.,2019;Dean&Lin,2019;Meuleauetal.2019;Singh&Cohn,2019;Wang&Mahadevan,2019.FactoredstatePOMDPsBoutilieretal.,2019;Boutilier&Poole,2019;Hansen&Feng,2000.HierarchicalPOMDPsCastanon,2019;Hernandez-Gardiol&Mahadevan,2019;Theocharousetal.,2019;Wiering&Schmidhuber,2019.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyRelatedworkonhierarchicalmOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarProposedresearch1)Algorithmicdesign2)Algorithmicanalysis3)Modellearning4)SystemdevelopmentandapplicationThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyProposedresearchThesisProposResearchblock#1:AlgorithmicdesignGoal1.1:Developing/implementinghierarchicalPOMDPalgorithm.Goal1.2:ExtendingH-POMDPforfactorizedstaterepresentation.Goal1.3:Usingstate/observationabstraction.Goal1.4:Planningforcontrollerswithnolocalrewardinformation.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResearchblock#1:AlgorithmicAssumption#2:“EachPOMDPcontrollerhasfullstatesetS0,andobservationsetO0.”Canwereducethenumberofstates/observations,|S|and|O|?Goal1.3:State/observationabstractionThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAssumption#2:Goal1.3:State/Assumption#2:“EachPOMDPcontrollerhasfullstatesetS0,andobservationsetO0.”Canwereducethenumberofstates/observations,|S|and|O|?Yes!Eachcontrolleronlyneedssubsetofstate/observationfeatures.Whatisthecomputationalspeed-up?Goal1.3:State/observationabstractionNavigateLeftRightForwardBackwardInvestigateHealthCheckPulseCheckMeds POMDP recursive upper-boundTimecomplexity:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAssumption#2:Goal1.3:State/Goal1.4:LocalcontrollerrewardinformationAssumption#3:“Eachcontrollerincludessomeamountofdiscriminativerewardinformation.”Canwerelaxthisassumption?ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal1.4:LocalcontrollerrewGoal1.4:LocalcontrollerrewardinformationAssumption#3:“Eachcontrollerincludessomeamountofdiscriminativerewardinformation.”Canwerelaxthisassumption?Possibly.Userewardshapingtoselectpolicy-invariantrewardfunction.Whatisthebenefit?H-POMDPcouldsolveproblemswithsparserewardfunctions.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal1.4:LocalcontrollerrewResearchblock#2:AlgorithmicanalysisGoal2.1:EvaluatingperformanceoftheH-POMDPalgorithm.Goal2.2:Quantifyingthelossduetothehierarchy.Goal2.3:Comparingdifferentpossibledecompositionsofaproblem.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResearchblock#2:AlgorithmicGoal2.1:PerformanceevaluationHowdoesthehierarchicalPOMDPalgorithmcompareto:ExactvaluefunctionmethodsSondik,1971;Monahan,1982;Littman,2019;Cassandraetal,2019.PolicysearchmethodsHansen,2019;Kearnsetal.,2019;Ng&Jordan,2000;Baxter&Bartlett,2000.ValueapproximationmethodsParr&Russell,2019;Thrun,2000.BeliefapproximationmethodsNourbakhsh,2019;Koenig&Simmons,2019;Hauskrecht,2000;Roy&Thrun,2000.Memory-basedmethodsMcCallum,2019.ConsiderproblemsfromPOMDPliteratureanddialoguemanagementdomain.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal2.1:PerformanceevaluatiGoal2.2:QuantifyingthelossThehierarchicalPOMDPplanningalgorithmprovidesanapproximately-optimalpolicy.How“near-optimal”isthepolicy?Subjecttosome(veryrestrictive)conditions:“Thevaluefunctionoftop-levelcontrollerisanupper-boundonthevalueoftheapproximation.”Canweloosentherestrictions?Tightenthebound?Findalower-bound?AtopA1......Vtop(b)Vactual(b)ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal2.2:QuantifyingthelossGoal2.3:ComparingdifferentdecompositionAssumption#4: “Wearegivenanactionsetpartitioninggraph.”Whatmakesagoodhierarchicalactiondecomposition?Comparingdecompositionsisthefirststeptowardsautomaticdecomposition.ManufactureExamineInspectReplacea1a2ManufactureReplaceExamineInspecta1a2a3ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyGoal2.3:ComparingdifferentResearchblock#3:ModellearningGoal3.1:Automaticallygeneratinggoodactionhierarchies.Assumption#4:“Wearegivenanactionsetpartitioninggraph.”Canweautomaticallygenerateagoodhierarchicaldecomposition?Maybe.
ItisbeingdoneforhierarchicalMDPs.Goal3.2:Includingparameterlearning.Assumption#5:“WearegivenafullPOMDPmodeloftheproblem.”Canweintroduceparameterlearning?Yes!Maximum-likelihoodparameteroptimization(Baum-Welch)canbeusedforPOMDPs.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResearchblock#3:ModellearnTouchscreeninputSpeechutteranceResearchblock#4:SystemdevelopmentandapplicationGoal4.1:BuildinganextensivedialoguemanagerTouchscreenmessageSpeechutteranceDialogueManagerRemindermessageRobotsensorreadingsMotioncommandStatusinformationFacemailoperationsRobotmoduleRemindingmoduleTeleoperationmoduleUserRemote-controlcommandThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyTouchscreeninputResearchblocAnimplementedscenarioPhysiotherapyPatientroomRobothomeProblemsize:|S|=288,|A|=14,|O|=15StateFeatures:{RobotLocation,UserLocation, UserStatus,ReminderGoal, UserMotionGoal,UserSpeechGoal}Testsubjects:3elderlyresidentsinassistedlivingfacilityThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAnimplementedscenarioPhysiotContributionsAlgorithmiccontribution:AnovelPOMDPalgorithmbasedonhierarchicalstructure.EnablesuseofPOMDPsformuchlargerproblems.Applicationcontribution:ApplicationofPOMDPstodialoguemanagementisnovel.Allowsdesignofrobustrobotbehaviouralmanagers.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyContributionsThesisProposal:Researchschedule1)Algorithmicdesign/implementation2)Algorithmicanalysis3)Modellearning4)Systemdevelopmentandapplication5)Thesiswritingfall01spring/summer02spring/summer/fall02ongoingfall02/spring03ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyResearchscheduleThesisProposQuestions?ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyQuestions?ThesisProposal:HieAsimulatedrobotnavigationexampleDomainsize:|S|=11,|A|=6,|O|=6GetReward(t)ReadMapActNavigate(t)ReadOpenDoorGoLeftGoRightGoBackGoForward($$)($$)ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAsimulatedrobotnavigationeAdialoguemanagementexample-AskGoWhere-GoToRoom-GoToKitchen-GoToFollow-VerifyRoom-VerifyKitchen-VerifyFollow-GreetGeneral-GreetMorning-GreetNight-RespondThanks-AskWeatherTime-SayCurrent-SayToday-SayTomorrow-StartMeds-NextMeds-ForceMeds-QuitMeds-AskCallWho-Call911-CallNurse-CallRelative-Verify911-VerifyNurse-VerifyRelative-AskHealth-OfferHelp-SayTimeActCheckHealthPhoneDoMedsCheckWeatherMoveGreetDomainsize:|S|=20,|A|=30,|O|=27ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAdialoguemanagementexample-ActionhierarchyforimplementedscenarioActRemindAssistRestMoveContactInformBringtoPhysioCheckUserPresentDeliverUserSayWeatherVerifyRequestSayTimeRemindPhysioPublishStatusRingBellGotoRoomVerifyBringVerifyReleaseRechargeGotoHomeThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyActionhierarchyforimplementSondik’spartsmanufacturingproblemManufactureExamineInspectReplacea1a2a3ManufactureExamineInspectReplacea1a2Decomposition1:Decomposition2:+5moredecompositionsThesisProposal:HierarchicalMethodsforPlanningunderUncertaintySondik’spartsmanufacturingpManufacturingtaskresultsThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyManufacturingtaskresultsThesReminderGoal={none,medsX}CommunicationGoal={none,personX}UserHealth={good,poor,emergency}Usingstate/observationabstractionActionSet:StateSet:CommunicationGoal={none,nurse,911,relative}-AskHealth-OfferHelpCheckHealthPhoneDoMeds-AskCallWho-CallHelp-CallNurse-CallRelative-VerifyHelp-VerifyNurse-VerifyRelativePhoneThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyReminderGoal={none,medsX}UsinRelatedworkonrobotplanningandcontrolManually-scripteddialoguestrategies:Denecke&Waibel,2019;Walkeretal.,2019.Markovdecisionprocesses(MDPs)fordialoguemanagementLevinetal.,2019;Fromer,2019;Walkeretal.,2019;Goddeau&Pineau,2000;Singhetal.,2000;Walker,2000.
Robotinterface:Torrance,2019;Asohetal.,2019.ClassicalplanningFikes&Nilsson,1971;Simmons,1987;McAllester&Rosenblitt,1991;Penberthy&Weld,1992;Kushmerick,2019;Veloso&al.,2019;Smith&Weld,2019.ExecutionarchitecturesFirby,1987;Musliner,1993;Simmons,1994;Bonasso&Kortenkamp,2019;ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyRelatedworkonrobotplanningDecision-theoreticplanningmodelsThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyDecision-theoreticplanningmoThetigerproblem:ValuefunctionsolutionVbeliefopen-rightopen-leftlistenS=tiger-leftS=tiger-rightThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:ValuefunctOptimizingthe“investigate”controllerVopen-rightlistenbeliefS=tiger-leftS=tiger-rightThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOptimizingthe“investigate”cOptimizingthe“act”controllerVbeliefopen-leftinvestigateS=tiger-leftS=tiger-rightThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOptimizingthe“act”controlleHierarchicalMethodsfor
PlanningunderUncertaintyThesisProposalJoellePineauThesisCommittee:SebastianThrun,ChairMatthewMasonAndrewMooreCraigBoutilier,U.ofTorontoHierarchicalMethodsfor
PlannIntegratingrobotsinlivingenvironmentsTherobot’srole: -Socialinteraction -Mobilemanipulation -Intelligentreminding -Remote-operation -Datacollection/monitoringThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyIntegratingrobotsinlivingeAbroadperspectiveGOAL=SelectingappropriateactionsUSER+WORLD+ROBOTACTIONSOBSERVATIONSBeliefstateSTATEThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAbroadperspectiveGOAL=SeleCause#1:Non-deterministiceffectsofactionsCause#2:PartialandnoisysensorinformationCause#3:InaccuratemodeloftheworldandtheuserWhyisthisadifficultproblem?UNCERTAINTYThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyCause#1:Non-deterministiceCause#1:Non-deterministiceffectsofactionsCause#2:PartialandnoisysensorinformationCause#3:InaccuratemodeloftheworldandtheuserWhyisthisadifficultproblem?UNCERTAINTYAsolution:PartiallyObservableMarkovDecisionProcesses(POMDPs)S3o1,o2S1o1,o2S2o1,o2a1a2ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyCause#1:Non-deterministiceThetruthaboutPOMDPsBadnews:FindinganoptimalPOMDPactionselectionpolicyiscomputationallyintractableforcomplexproblems.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetruthaboutPOMDPsBadnewsThetruthaboutPOMDPsBadnews:FindinganoptimalPOMDPactionselectionpolicyiscomputationallyintractableforcomplexproblems.Goodnews:Manyreal-worlddecision-makingproblemsexhibitstructureinherenttotheproblemdomain.Byleveragingstructureintheproblemdomain,IproposeanalgorithmthatmakesPOMDPstractable,evenforlargedomains.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetruthaboutPOMDPsBadnewsHowisitdone?Usea“Divide-and-conquer”approach:Wedecomposealargemonolithicproblemintoacollectionofloosely-relatedsmallerproblems.DialoguemanagerHealthmanagerSocialmanagerRemindingmanagerThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyHowisitdone?Usea“Divide-ThesisstatementDecision-makingunderuncertaintycanbemadetractableforcomplexproblemsbyexploitinghierarchicalstructureintheproblemdomain.ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThesisstatementDecision-makinOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarPOMDPswithinthefamilyofMarkovmodelsMarkovChainHiddenMarkovModel(HMM)MarkovDecisionProcess(MDP)PartiallyObservableMDP(POMDP)Uncertaintyinsensorinput?nonoControlproblem?yesyesThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPOMDPswithinthefamilyofMaPOMDPparameters:Initialbelief:b0(s)=Pr(so=s)Observationprobabilities:O(s,a,o)=Pr(o|s,a)Transitionprobabilities:T(s,a,s’)=Pr(s’|s,a)Rewards:R(s,a)HMM
WhatarePOMDPs?Components: Setofstates:sS Setofactions:aA Setofobservations:oO
0.50.51MDPS2Pr(o1)=0.9Pr(o2)=0.1S1Pr(o1)=0.5Pr(o2)=0.5a1a2S3Pr(o1)=0.2Pr(o2)=0.8ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPOMDPparameters:HMM WhatareAPOMDPexample:ThetigerproblemS1“tiger-left”Pr(o=growl-left)=0.85Pr(o=growl-right)=0.15S2“tiger-right”Pr(o=growl-left)=0.15Pr(o=growl-right)=0.85Actions={listen, open-left, open-right}RewardFunction: R(a=listen) =-1 R(a=open-right,s=tiger-left) =10 R(a=open-left,s=tiger-left) =-100ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyAPOMDPexample:ThetigerproWhatcanwedowithPOMDPs?1)Statetracking:Afteranaction,whatisthestateoftheworld,st?2)Computingapolicy:Whichaction,aj,shouldthecontrollerapplynext?Veryhard!Notsohard.bt-1??at-1otRobot:St-1stWorld:Controllayer:......??ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyWhatcanwedowithPOMDPs?VerThetigerproblem:StatetrackingS1“tiger-left”S2“tiger-right”Beliefvectorb0BeliefThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackThetigerproblem:StatetrackingS1“tiger-left”S2“tiger-right”Beliefvectorb0Beliefobs=growl-leftaction=listenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackThetigerproblem:Statetrackingb1obs=growl-leftS1“tiger-left”S2“tiger-right”BeliefvectorBeliefb0action=listenThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:StatetrackPolicyOptimizationWhichaction,aj,shouldthecontrollerapplynext?InMDPs:Policyisamappingfromstatetoaction,:siajInPOMDPs:Policyisamappingfrombelieftoaction,:bajRecursivelycalculateexpectedlong-termrewardforeachstate/belief:Findtheactionthatmaximizestheexpectedreward:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyPolicyOptimizationWhichactioThetigerproblem:OptimalpolicyBeliefvector:open-leftopen-rightlistenS1“tiger-left”S2“tiger-right”Optimalpolicy:ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyThetigerproblem:OptimalpolFinite-horizonPOMDPsareinworse-casedoublyexponential:Infinite-horizonundiscountedstochasticPOMDPsareEXPTIME-hard,andmaynotbedecidable(|n|).ComplexityofpolicyoptimizationThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyComplexityofpolicyoptimizatTheessenceoftheproblemHowcanwefindgoodpoliciesforcomplexPOMDPs?Isthereaprincipledwaytoprovidenear-optimalpoliciesinreasonabletime?ThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyTheessenceoftheproblemThesOutlineProblemmotivationPartiallyobservableMarkovdecisionprocessesThehierarchicalPOMDPalgorithmProposedresearchThesisProposal:HierarchicalMethodsforPlanningunderUncertaintyOutlineThesisProposal:HierarAhierarchicalapproachtoPOMDPplanningKeyIdea:Exploithierarchicalstructureintheproblemdomaintobreakaproblemintomany“related”POMDPs.Whattypeof
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 镇江2025年江苏镇江经济技术开发区教育系统招聘事业编制教师52人笔试历年参考题库附带答案详解
- 2024-2025学年新教材高中化学 第三章 晶体结构与性质 3.1 金属晶体教学实录 新人教版选择性必修2
- 企业品牌形象设计与建设
- 养生项目加盟合同标准文本
- 专业分包企业合同标准文本
- 2025年工业用氧分析仪项目合作计划书
- 从仿生学角度看产品设计与创新
- 绍兴2025年浙江绍兴市上虞区卫生健康系统招聘卫技人员39人笔试历年参考题库附带答案详解
- Unit 4 Eat well Section B 2a-2b 教学设计 2024-2025学年人教版(2024)七年级英语下册
- 企业供销结算合同标准文本
- 本科教学工作审核评估教学经费分项整改总结报告
- 债权债务转让三方协议
- 基于泛在电力物联网全过程基建管理智慧工地建设方案
- 2023年陕西省中考历史真题含答案
- 形势与政策(吉林大学)智慧树知到答案章节测试2023年
- 用户中心积分成长值体系需求文档
- 2021商超全年52周企划MD营销销售计划培训课件-96P
- 劳务派遣用工管理办法
- 初中数学人教七年级下册第七章 平面直角坐标系 平面直角坐标系中图形面积的求法PPT
- 颊癌病人的护理查房
- YSJ 007-1990 有色金属选矿厂 试验室、化验室及技术检查站工艺设计标准(试行)(附条文说明)
评论
0/150
提交评论