版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
AIandMacroeconomicModeling:DeepReinforcementLearninginanRBCModel
TohidAtashbarandRui(Aruhan)ShiWP/23/40
IMFWorkingPapersdescriberesearchinprogressbytheauthor(s)andarepublishedtoelicitcommentsandtoencouragedebate.
TheviewsexpressedinIMFWorkingPapersarethoseoftheauthor(s)anddonotnecessarilyrepresenttheviewsoftheIMF,itsExecutiveBoard,orIMFmanagement.
2023
FEB
©2023InternationalMonetaryFund WP/23/40
IMFWorkingPaper*
Strategy,PolicyandReviewDepartment
AIandMacroeconomicModeling:DeepReinforcementLearninginanRBCModelPreparedbyTohidAtashbarandRui(Aruhan)Shi
AuthorizedfordistributionbyStephanDanningerFebruary2023
IMFWorkingPapersdescriberesearchinprogressbytheauthor(s)andarepublishedtoelicitcommentsandtoencouragedebate.TheviewsexpressedinIMFWorkingPapersarethoseoftheauthor(s)anddonotnecessarilyrepresenttheviewsoftheIMF,itsExecutiveBoard,orIMFmanagement.
ABSTRACT:Thisstudyseekstoconstructabasicreinforcementlearning-basedAI-macroeconomicsimulator.WeuseadeepRL(DRL)approach(DDPG)inanRBCmacroeconomicmodel.Wesetuptwolearningscenarios,oneofwhichisdeterministicwithoutthetechnologicalshockandtheotherisstochastic.Theobjectiveofthedeterministicenvironmentistocomparethelearningagent'sbehaviortoadeterministicsteady-statescenario.Wedemonstratethatinbothdeterministicandstochasticscenarios,theagent'schoicesareclosetotheiroptimalvalue.Wealsopresentcasesofunstablelearningbehaviours.ThisAI-macromodelmaybeenhancedinfutureresearchbyaddingadditionalvariablesorsectorstothemodelorbyincorporatingdifferentDRLalgorithms.
RECOMMENDEDCITATION:Atashbar,T.andShi,R.A.2023.“AIandMacroeconomicModeling:DeepReinforcementLearninginanRBCmodel”,IMFWorkingPapers,WP/22/40.
JELClassificationNumbers:
C63,C54;D83;D87;E37
Keywords:
Reinforcementlearning;Deepreinforcementlearning;Artificialintelligence,RL;DRL;Learningalgorithms;Macromodeling,RBC;Realbusinesscycles;DDPG;Deepdeterministicpolicygradient;Actor-criticalgorithms
Author’sE-MailAddress:
tatashbar@;
ashi@
*TheauthorswouldliketothankStephanDanningerforhishelpfulcommentsandsuggestions.WeappreciatetheviewsandsuggestionsprovidedbyMicoMrkaic,DmitryPlotnikov,SergioRodriguezandattendeesattheIMFSPRMacroPolicyDivisionBrownbagSeminar.CommentsbyAllanDizioliarealsogratefullyacknowledged.Allerrorsremainourown.
WORKINGPAPERS
AIandMacroeconomicModeling:DeepReinforcementLearninginanRBCModel
PreparedbyTohidAtashbarandRui(Aruhan)Shi
IMFWORKINGPAPERS
TitleofWP
INTERNATIONALMONETARYFUND
PAGE
2
Contents
GLOSSARY 3
INTRODUCTION 4
ANOVERVIEWOFTHELITERATURE 5
AREALBUSINESSCYCLE(RBC)MODEL 8
Households 8
Firms 9
Functionalformsandparameters 10
Adeterministicsteadystate 10
AIEXPERIMENTS 11
ExperimentI:deterministicenvironment 15
ExperimentII:stochasticenvironment 19
Issuesduringlearning 22
CONCLUSION 24
ANNEXI.DDPGALGORITHM 26
REFERENCES 27
FIGURES
Figure1.SL,ULandRLinML 7
Figure2Laborhoursduringtraining(200episodes) 17
Figure3Laborhourseriesduringtrainingandtesting 17
Figure4Distancethesteadystate(SS)valuesforlaborhourandconsumption 18
Figure5Productivityshockserieszt 19
Figure6Simulatedseriesduring100testingperiods 20
Figure7Laborhourchoicebeforeandafterlearning(200episode) 21
Figure8Distancetodeterministicsteadystates(SS)forlaborhourandconsumption 22
Figure9Distancetodeterministicsteadystates(SS)foroutputandinvestment 22
Figure10Outputperunitoflabor 23
Figure11Investmentperunitoflabor 24
TABLES
Table1.BaselineparametersforRBCmodel 10
Table2Algorithmrelatedparameters 13
Table3RLsetupoftheRBCmodel 15
Glossary
AGI ArtificialGeneralIntelligenceAI ArtificialIntelligence
ANN ArtificialNeuralNetworks
DDPGDeepDeterministicPolicyGradientDL Deeplearning
DNN DeepNeuralNetwork
DPG DeterministicPolicyGradientDQN DeepQ-Network
DRL DeepReinforcementLearning
MADDPG Multi-AgentDeepDeterministicPolicyGradientRBC RealBusinessCycle
RL ReinforcementLearningSAC SoftActor-Critic
SL SupervisedLearningTD3 TwinDelayedDDPGUL UnsupervisedLearning
Introduction
Macroeconomicmodelingistheprocessofconstructingamodelthatdescribesthebehaviorofamacroeconomicsystem.Thisprocesscanbeusedtodeveloppredictionsaboutthefuturebehaviorofthesystem,tounderstandtherelationshipsbetweendifferentvariablesinthesystem,ortosimulatebehavior.
Artificialintelligence(AI)isabranchofcomputersciencethatdealswiththedesignanddevelopmentofintelligentcomputersystems.AIresearchdealswiththequestionofhowtocreateprogramsthatarecapableofintelligentbehavior,i.e.,thekindofbehaviorthatisassociatedwithhumanbeings,suchasreasoning,learning,problem-solving,andactingautonomously.
Thetwofieldscouldbeconceptuallycombined,asAItechniquescouldbeusedtodevelopmoreaccuratemacroeconomicmodels,oronecouldusemacroeconomicmodelstohelpdesignartificialgeneralintelligentsystemsthatarebetterabletosimulateeconomic(ormorebroadlysocial)behaviors,amongmanyothertasks.AIcanbeusedtoautomaticallyidentifyrelationshipsbetweenvariables,ortodevelopnewwaysofrepresentingeconomicsystems.AIcanalsobeusedtodevelopmethodsforautomaticallylearningfromdata,whichcanbeusedtoimprovetheaccuracyofpredictions.AIalsocouldbeusedtodevelopmoresophisticatedmodelsthattakeintoaccountawiderrangeoffactors,includingnon-economicfactorssuchaspoliticalinstabilityorweatherpatterns.
Anincreasingbodyofworkleveragesmachinelearningforforecasting(AtashbarandShi,2022),besidessomerecentdevelopmentsinoptimization,marketdesign,andalgorithmicgametheory,butAI'simpactoneconomics,especiallyinthefieldofmacroeconomicmodeling,hasbeenmodestsofar.Thishasbeencausedbyacombinationoffactorsincludingtherelativelynewnessofthefield,thedifficultyofdesigningAIagentscapableofrealisticallyimitatinghumanbehaviorinaneconomy,thelackofdataavailablefortrainingAImodels,andthelackofcomputationalresourcesneededtotrainandrunlargemacroeconomicsimulations.
ButwiththeemergenceofanewgenerationofAImodelscalledreinforcementlearning(RL),there'sagrowingbeliefthatAIwillhaveatransformativeimpactonmacroeconomicmodeling(Tilbury,2022).ThisisprimarilybecauseRLmodelsaremuchbettersuitedthanpreviousAImodelsforimitatinghumanbehavior.Inaddition,RLmodelsrequiremuchlessdatatobetrained(theygeneratetheirowndatathroughinteractionwiththeirenvironment)andcouldbemuchmoreefficientintermsofcomputationalresourcesinspecificsettingsoralgorithms.
ThegoalofthispaperistobuildarelativelysimpleandextendablemacroeconomicmodelbasedonRLthatcangeneraterealisticmacroeconomicdynamicsthatarecomparabletomodelsundertherationalexpectationsassumptionwhilenotimposingunrealisticrestrictionslikeperfectforesightoneconomicagents.Theresultingmodelwillbeusedasaprototypeforfutureextensionsinpolicyexperimentortocustomizeittobettermatchtheconditions,shocksordataofaparticularorglobaleconomy.
Tothisend,weimplementanadvanceddeepRL(DRL)algorithm(thedeepdeterministicpolicygradient(DDPG))inarealbusinesscycle(RBC)macroeconomicmodel.WechosetheDDPGalgorithmforthisbasicmodel(withaneyeonthepossibleextensionsofthemodelinthefuture)forseveralreasons(SuttonandBarto(2018),GraesserandKeng(2019),ZaiandBrown(2020)andPowell(2021)):
First,itisoneofthemodernRLalgorithmsthatcanbeappliedtocontinuousactionspaceproblems,whichiscrucialformodelingmacroeconomicvariables.Second,itisoneoftheRLalgorithmsthatcanhandlehigh-dimensionalstateandactionspaces,whicharetypicalinmacroeconomicmodels(e.g.,thenumberofdifferenteconomicsectors).Third,theseparationofpolicyandvaluefunctionsinthealgorithmallowsforanalyzingeachcomponentindependentlyduringthelearningprocess.Fourth,theDDPGalgorithmisoneofthefewRLalgorithmsthatcanbeappliedtonon-stationaryproblems,whicharecommoninmacroeconomicmodeling.
Fifth,itisoneofthefewRLalgorithmsthatcanbeappliedtoproblemswithaverylong-timehorizon,whichmightbeimportantformacroeconomicmodeling.Sixth,theDDPGalgorithmisoneofthefewRLalgorithmsthatcanbeapplied,inspecificsettings,topartiallyobservableMarkovdecisionprocess(POMDP)problemsor,
inotherwords,toproblemswithalimitedobservationwindoworlimitedinformationsettings.Thiscouldbeimportantforsomemacroeconomicmodelingworkssincetheobservationwindowisoftenlimitedbythefrequencyofthedata.Finally,TheDDPGalgorithmhasbeenshowntoperformwellinavarietyofchallengingproblemsintheRLliterature.However,similartootherRLalgorithms,theDDPGalgorithmisalsoknowntobeunstableinsomesettingsandcandivergeifthelearningprocessisnotproperlytuned.
WefindthattheRLaugmentedRBCmodelperformssimilartotheRBCmodelundertherationalexpectationsassumptiononcethelearningrepresentativeagenthaslearntformanysimulationperiods.Thisisachievedfromthestagewhentherepresentativeagentdoesnotunderstandtheeconomicstructure,itspreferenceorhowtheeconomytransitionsovertime.However,thetrainingtakesasignificantamountofsimulationperiods,inpartduetothemechanismthattheagentneedstogenerateitsownexperiencetolearnfromit.Tosimulaterealistichouseholds’behaviorsthatmatchempiricallearningperiods,furtherworkisneededtocalibratetheparameters,ortransferpastexperiencetothelearningagentasastartingpointoflearning.
Theseencouragingresultsneedtobeputinperspective.Inadditiontotherudimentary(butextendible)characterofourmodelstructure,adisadvantageofourworkisalsotherestrictedscopeoftheRBCmodels.Thebusinesscyclevariationsareonlypropagatedthroughanexogenousproductivityshock.TheempiricallyimpliedmagnitudeoftruetechnologyshockislikelytobesmallerthanwhattheRBCmodelspredict.
Unemploymentisalsoexplainedinanoverlysimplifiedmanner:intertemporalsubstitutionsbetweenlaborandleisureexplainsemploymentvariations.Forworkerstogainhighutility,itisbettertoworkmoreinproductiveperiods,andlessinunproductiveperiods.However,RBCmodelsarethecorecomponentoftheDSGEmodelsthatarelargelyappliedinpolicyinstitutionsandcentralbanks.Itisscalableandeasilybuilton.Itiswellknownandstudied,andthuseasytocomparelearningresultswithexistingtheory.
WehopethisworkwillencouragefurtherresearchintheapplicationofAIanddeepRLformacroeconomicproblemsandwillopenupanewdirectionofresearchtocombinedeepRLwithstandardmacroeconomicmodels.Inparticular,weexpectittobeabaseandextensionformoreadvancedapplicationsattheFundthatexploretheuseofdeepRLformacroeconomicpolicyanalysis.
Therestofthepaperisorganizedasfollows.SectionIprovidesabriefliteraturereviewofAIandRL/deepRLapplicationsinmacroeconomicpolicy.SectionIIdescribestheRBCmodel.SectionIIIintroducestheDRLalgorithm,theenvironmentandtheAIexperimentsweconduct,theresults,andtheissuesduringlearning,andSectionIVconcludes.
Anoverviewoftheliterature
Artificialintelligence(AI)isagrowingfieldofcomputersciencefocusedoncreatingintelligentcomputersystems,ormachines,thatcanreason,learn,andactautonomously.AIsystemsaredesignedtomimichumancognitiveabilities,suchaslearning,problemsolving,andnaturallanguageprocessing.
Theterm“artificialintelligence”wasfirstcoinedin1956bycomputerscientistJohnMcCarthy(Andresen,2002).AIresearchishighlyinterdisciplinary,involvingdisciplinessuchascomputerscience,psychology,neuroscience,linguistics,philosophy,andanthropology.
TherearethreebroadcategoriesofAIsystems(Goertzel,2007):
NarrowAIorweakAIsystemsaredesignedtoperformaspecifictask,suchasfacialrecognitionormodelfinancialmarkets.
GeneralAIorstrongAIsystemsaredesignedtoperformawiderangeoftasks,suchasreasoningandplanning.
SuperAIorartificialgeneralintelligence(AGI)arehypotheticalAIsystemsthatmatchorexceedhumanintelligence.
AIisalreadybeingheavilyusedacrossmultiplefieldsandindustriesincludinghealthcare,retail,finance,imageprocessing,autonomousdriving,andmanymore.TheapplicationofAIineconomicsisstillinitsearlystagesandhasyettobesufficientlydevelopedinitsapplication.Nonetheless,sometheorizethatsoonerorlater,AI-economistmachinescouldcatchupwiththehumaneconomistsinmanyareas(Atashbar,2021a,2021b).AIhasbeenusedineconomicsmostlyforpredictionsandforecasts,marketanalysisandtheimpactanalysisofalternativepolicies.Lu&Zhou(2021),Ruiz-Realet.al.,(2021),Goldfarbet.al.,(2019),Cao(2020),andVelosoetal.,(2021)lookathowAIis/couldbeusedineconomicsandfinance.
Machinelearning(ML)isabranchofartificialintelligencethatusesartificialneuralnetworks(ANN)tolearnfromdata,withoutbeingexplicitlyprogrammed.ANNisadata-drivenapproachtomachinelearningthatisbasedontheideaofartificialneurons,ornodes,thatareconnectedinlayers.Theinputlayerreceivestheinputdata,andtheoutputlayerproducestheoutput.Thehiddenlayersinbetweenperformthelearningbyadjustingtheweightsoftheconnectionsbetweenthenodes.Deeplearning(DL)isasubsetofmachinelearningthatusesadeepneuralnetwork(DNN)tomodelcomplexpatternsindata.ADNNisanANNwithadeeparchitecture.Thismeansthattheneuralnetworkcontainsnotonlyaninputlayerandanoutputlayer,butalsooneormorelayersinbetweentoaddfurthernon-linearitiesinordertorecognizecomplexpatternsinadataset.
Therearethreegeneralapproachestothelearningprocessesinmachinelearning:
Supervisedlearning(SL):Themachineisprovidedwithasetoftrainingdata,whichincludesboththeinputdataandthedesiredoutput.Thedataislabeled.Themachineisthenabletolearnandgeneralizefromthisdatainordertoproducethedesiredoutputfornewdata.Themainapplicationsofsupervisedlearningareclassification,regression,andprediction.
Unsupervisedlearning(UL):Themachineisprovidedwithasetofinputdata,butnotthedesiredoutput.Theinputisnotlabeled.Themachinemustthenlearntofindpatternsandrelationshipsinthedatainordertoproducethedesiredoutput.Semi-SupervisedLearningcombinessupervisedandunsupervisedlearning.Thismeansthatthetrainingdatasetcontainsbothlabelleddata(i.e.,everypieceofinputdataisattachedtoadesiredoutput)andunlabeleddata(i.e.,inputdataisnotattachedtoadesiredoutput).Themainapplicationsofunsupervisedlearningareclustering,dimensionalityreduction(e.g.,principalcomponents),andassociationrulelearning.
Reinforcementlearning(RL):Itisdifferentfrombothsupervisedandunsupervisedlearninginthatitisnotgivenasetoftrainingdata.Themachineisgivenasetofrulesorobjectives,anditmustlearnhowtobestachievetheseobjectivesthroughrepeatedinteractionswiththeenvironment.Themainapplicationsofreinforcementlearningarecontrol,robotics,optimizationandgaming.
Figure1.SL,ULandRLinML
Source:authors’construction
Deepreinforcementlearning(DRL)isamachinelearningtechniquethatcombinesreinforcementlearning(RL)withdeeplearning(DL),meaningthatitusesaDNNtorepresenttheRLagent(Li,2017).ThisapproachisusedtosolveproblemsthataretoodifficultforsimpleRLalgorithmsalone.ForanintroductionofthetheoryandseveralalgorithmsinRL/RDL,seeAtashbarandShi(2022a).
SurveysbyAthey(2018),Cameron(2019),Nosratabadietal.,(2020)andHull(2021)provideacomprehensivereviewofthemethodsandusecasesofMLandDLineconomics.TheapplicationofRLandDRLineconomicshasarelativelyshorthistoryandisinitsearlystages.Theliteratureondeepreinforcementlearningineconomicsmainlyfocusesontheapplicationofdeepreinforcementlearninginmicroeconomicsettings.
Reinforcementlearninghasbeenappliedtovariouseconomicproblems,suchasdynamicpricinginelectricitymarkets,auctiontheory,portfoliomanagementandassetpricing.
Fengetal.(2018)modeltherulesofanauctionasaneuralnetworkandusedeeplearningfortheautomateddesignofoptimalauctions.Theydiscovernewauctionswithhighrevenueformulti-unitauctionswithprivatebudgets,includingproblemswithunit-demandbidders.Zhengetal.(2020)employreinforcementlearningtoexamineanddecideontheactionsofagentsandasocialplannerinagather-and-buildenvironment.TheydemonstratethatAI-driventaxpoliciesenhancethetrade-offbetweenequalityandproductivityoverbaselinepolicies.
Düttingetal.(2021)modelanauctionasamulti-layerneuralnetwork,frameoptimalauctiondesignasaconstrainedlearningproblem,andshowhowitcanbesolvedusingstandardmachinelearningpipelines.Theydemonstrategeneralizationlimitsanddescribeextensiveexperiments,recoveringessentiallyallknownanalyticalsolutionsformulti-itemsettings,andproposenewmechanismsforsettingsinwhichtheoptimalmechanismisunknown.
Whilebeinglimited,thereisalsoagrowingbodyofliteratureontheapplicationofreinforcementlearningtomacroeconomicmodels.Inmacroeconomicsliterature,DeepRLalgorithmshavelargelybeenusedinafewdomains.Oneofthemistousereinforcementalgorithmstofindtheoptimalpossiblepolicyorpolicyresponsefunction,asinHinterlangandTänzer(2022)andCovarrubias(2022).Thisfieldencompassesgeneralequilibriummodelsolvingaswell,asdemonstratedbyCurryetal(2022).
ThelearnabilityofrationalexpectationsolutionsinageneralequilibriummodelwithmultipleequilibriaisalsoatopicChenetal(2021)study.Byusingarepresentativeagentwithnumerousequilibriainamonetarymodel,
theydemonstratethattheRLagentcanlocallyconvergetoallofthestablestatesthatthemonetarymodeldescribes.
Modelingrationalityandboundedrationalityisanotherareaofemphasis.Hilletal.(2021)demonstratehowtosolvethreerationalexpectationsequilibriummodelsusingdiscreteheterogeneousagentsasopposedtoacontinuumofagentsorasinglerepresentativeagent.Shi(2021)investigatesRLagents'consumption-savingbehaviorinastochasticgrowthsetting.ShefocusesonthedifferencesinlearningbehaviorsthatoccurwhenRLagentsvaryintermsoftheirexplorationlevels,andhowthisaffectstheconvergenceofoptimumpolicy.
Similartopreviousresearch,ourworkaddsadditionalevidencetestingaDRLalgorithminamacroeconomicmodel.However,weimplementarepresentativeDRLagentinanRBCmodel,whichisservedasafundamentalbuildingblockforthecommonlyusedNewKeynesianDSGEmodels.
ARealBusinessCycle(RBC)Model
ThebaselineRBCmodelcontainsidenticalandinfinitelylivedhouseholdsandfirms.Thebusinesscyclefluctuationsaregeneratedbyrealshocks,i.e.,atechnologyshocktoproductivity.Inthisspecification,thehouseholdsownthefirmsandrentoutcapital.Thefirmsissuebothdebt(bonds)andequity(dividend).
Households
Ahouseholdmakesconsumption-savingandwork-leisuredecisions.Hemaximizesexpectedutility:
subjecttotheconstraints:
∞
𝐸0∑𝛽𝑡𝑢(𝑐𝑡,1−ℎ𝑡)
𝑡=0
𝑥𝑡+𝑐𝑡+𝑏𝑡+1≤𝑤𝑡ℎ𝑡+𝑟𝑡𝑘𝑡+𝑅𝑡𝑏𝑡+Π𝑡
𝑘𝑡+1≤(1−𝛿)𝑘𝑡+𝑥𝑡
𝑘𝑡≥0
𝑘0isgivenandthemaximizationalsosatisfiesthetransversalitycondition.
𝑐𝑡denotesconsumption,𝑥𝑡denotesinvestment,𝑏𝑡+1denotesbondholding,𝑤𝑡hourlywagerate,ℎ𝑡denoteshoursworked,𝑟𝑡denotesreturnoncapital,𝑘𝑡denotescapital,𝑅𝑡denotesinterestrateonbondholding,andΠ𝑡denotesdividendpayment.
ℎ𝑡∈[0,1]inperiod𝑡,andtheconsumerreceivesutilityfromleisure.
Thechoicestheconsumermakesattime𝑡are(𝑥𝑡𝑜𝑟𝑘𝑡+1,𝑐𝑡,𝑏𝑡+1,ℎ𝑡),giventime𝑡informationandtheinterestrateonbonds,𝑅𝑡+1.
OptimizationunderRationalExpectation
ThissectionaswellasSectionB.1derivetheoptimizationconditionsundertherationalexpectationsassumption.TheaimistocomparelearningresultsofaDRLRBCmodelwiththerationalexpectationssolution.InimplementingaDRLalgorithm,thefirstorderconditionsincludingtheEulerequationarenotrequired.Therepresentativehousehold’sLagrangianis,
∞
𝐿=𝐸0∑𝛽𝑡{𝑢(𝑐𝑡,ℎ𝑡)+𝜆𝑡(𝑤𝑡ℎ𝑡+𝑟𝑡𝑘𝑡+𝑅𝑡𝑏𝑡+Π𝑡−𝑐𝑡−𝑘𝑡+1+(1−𝛿)𝑘𝑡−𝑏𝑡+1)}
𝑡=0
Thefirstorderconditionsare:
𝜕𝐿=0↔𝑢𝑐(𝑐,ℎ)=𝜆
𝜕𝑐𝑡
𝑡𝑡 𝑡
𝜕𝐿=0↔𝑢ℎ(𝑐,ℎ)=𝜆𝑤
𝜕ℎ𝑡
𝑡𝑡
𝑡𝑡
𝜕𝐿
𝜕𝑘𝑡+1
=0↔𝜆𝑡=𝛽𝐸𝑡𝜆𝑡+1{𝑟𝑡+1+(1−𝛿)}
𝜕𝐿
𝜕𝑏𝑡+1
=0↔𝜆𝑡=𝛽𝐸𝑡𝜆𝑡+1(𝑅𝑡+1)
𝜕𝐿
𝜕𝑐𝑡
𝜕𝐿
𝜕𝑐𝑡
and𝜕𝐿
𝜕𝑘𝑡+1
and𝜕𝐿
𝜕𝑏𝑡+1
yield:
yield:
𝑢𝑐(𝑐𝑡,ℎ𝑡)=𝛽𝐸𝑡𝑢𝑐(𝑐𝑡+1,ℎ𝑡+1)(𝑟𝑡+1+1−𝛿)
𝑢𝑐(𝑐𝑡,ℎ𝑡)=𝛽𝐸𝑡𝑢𝑐(𝑐𝑡+1,ℎ𝑡+1)𝑅𝑡+1
𝜕𝐿
𝜕𝑐𝑡
and𝜕𝐿yield:
𝜕ℎ𝑡
𝑢ℎ(𝑐𝑡,ℎ𝑡)=𝑤𝑡𝑢ℎ(𝑐𝑡,ℎ𝑡)
Firms
Aprofitmaximizingfirm’sproblemis:
max𝑒𝑧𝑡𝐹(𝐾𝑡,𝐻𝑡)−𝑤𝑡𝐻𝑡−𝑟𝑡𝐾𝑡
Kt,Ht
where𝐾𝑡isthecapitalinput,𝐻𝑡isthelabourinput,Fisaneoclassicalproductionfunction,suchastheCobb-Douglasproductionfunction,𝑧𝑡followsanAR(1)processasfollows.
𝜖𝑡issampledfromawhitenoiseprocess.
𝑧𝑡=𝜌𝑧𝑡−1+𝜖𝑡
OptimizationunderRationalExpectations
Thefirmsfirstorderconditionsgivewagerateandcapitalrentalrateequations:
𝑤𝑡=𝑒𝑧𝑡𝐹𝐾(𝐾𝑡,𝐻𝑡)
𝑟𝑡=𝑒𝑧𝑡𝐹𝐻(𝐾𝑡,𝐻𝑡)
IMFWORKINGPAPERS
TitleofWP
INTERNATIONALMONETARYFUND
PAGE
10
Thedebtthefirmissuesisindeterminateinthissetup.
Functionalformsandparameters
Table1presentsbaselineparametersfollowCooleyandPrescott(1995)fortheUSdata.
Table1.BaselineparametersforRBCmodel
Description
Parametervalue
Relevantequations
Utilityfunctionparameters
𝜒=1(logarithmicutility)
𝛼=0.64
(𝑐1−𝛼(1−ℎ)𝛼)1−𝜒
𝑢(𝑐,ℎ)= 𝑡 𝑡
𝑡𝑡 1−𝜒
𝑢(𝑐𝑡,ℎ𝑡)=(1−𝛼)ln𝑐𝑡+𝛼ln(1−ℎ𝑡)
Productionfunction
𝜃=0.4
𝐹(𝐾,𝐻)=𝐾𝜃𝐻1−𝜃
Discountrate–𝛽
0.99
∞
𝐸0∑𝛽𝑡𝑢(𝑐𝑡,1−ℎ𝑡)
𝑡=0
Autoregressiveparameter-𝜌
0.95
𝑧𝑡=𝜌𝑧𝑡−1+𝜖𝑡
Standarddeviationof𝜖𝑡,𝜎𝜖
0.007
𝑧𝑡=𝜌𝑧𝑡−1+𝜖𝑡
Capitaldepreciation–𝛿
0.012
𝑘𝑡+1≤(1−𝛿)𝑘𝑡+𝑥𝑡
Adeterministicsteadystate
AssumefortheparametervaluesandfunctionalformspresentedinsectionC.Atadeterministicsteadystate,
𝑧𝑡=0,𝑘𝑡+1=𝑘𝑡=𝑘∗,𝑐𝑡+1=𝑐𝑡=𝑐∗.ThefirstorderconditionsinsectionA.1andB.1becomethefollowingsteadystateconditions:
1 𝑘∗
𝜃−1
𝑘∗
1−1+𝛿
𝛽
1
𝜃−1
𝛽−1+𝛿=𝜃(ℎ∗)
↔ℎ∗=(
) (=124.7)
𝜃
𝑘∗𝜃
𝑦∗=()ℎ∗
ℎ∗↔
𝑘∗
𝑦∗
ℎ∗
𝑘∗𝜃
=(ℎ∗)
𝑖∗
(=6.89)
𝑘∗
𝑖∗=𝛿𝑘∗=𝛿(
ℎ∗
)ℎ∗↔
ℎ∗
=𝛿(
ℎ∗
)(=1.5)
IMFWORKINGPAPERS
TitleofWP
INTERNATIONALMONETARYFUND
PAGE
11
Theaccountingidentitygivesthevalueofconsumption1:
𝑐∗
𝑦∗
𝑖∗
𝑐∗=𝑦∗−𝑖∗↔
ℎ∗
=ℎ∗
−ℎ∗
(=5.39)
ThevaluesinparenthesisaresteadystatevaluescalculatedbasedontheparameterspresentedinTable1.
𝑘∗𝑦∗𝑐∗𝑖∗
Thesteadystatevaluescanbecalculatedforallrealvariablesperunitoflaborinput,i.e.,ℎ∗,ℎ∗,ℎ∗,ℎ∗.Wagerateandcapitalrentalrateareasfollows.
𝑘∗𝜃
𝑤∗=(1−𝜃)()
ℎ∗
𝑟∗=𝜃(
𝑘∗
ℎ∗
𝜃−1
)
−𝛿
AIExperiments
ThefollowingsimulationsdemonstratethelearningbehaviorsofarepresentativeRLagentandtheeconomicdynamics.Wefirstcomparetheagent’sdecisions(e.g.,choiceoflaborhour)atthebeginningofalearningprocesswiththesameagent’sdecisionsaftermanysimulationperiodsoflearning.Thisistoshowthattheagent’sprogressoflearninginanunknownenvironmentfollowingtheframeworkoflearningfrompastitsownexperience.Wethencomparethelearningagent’sdecisionswithwhatarationalexpectationsagentwouldmakeinthesameenvironment.WealsoplotseriesofmacroeconomicvariablestoshowthattheRBCmodelwithaRLagentmakessimilarqualitativepredictionstoaconventionalRBCmodel.
Wesetuptwoenvironments,oneisadynamic3anddeterministicenvironmentwithoutanyshocks,andtheotherisstochasticwithtechnologyshocks.ThisistofirstofferaclearcomparisonofRLagent’sbehaviorswitharationalexpectationsagentinadeterministicenvironment.Asmostmacroinsightsarederivedfromstochasticmodels,wethenhighlightthattheRLagentbehavesandlearnswellinastochasticenvironmentaswell.
Implementation
WeimplementDDPGalgorithminthispaper.ItisfirstintroducedbyLillicrapetal(2015)4inthepaper"ContinuousControlwithDeepReinforcementLearning".ThealgorithmwasdesignedtoimprovetheissueofapplyingRLmethodstocontinuousactionspaces.ThemainideabehindDDPGistouseaDNNtoapproximatetheaction-valuefunction.DDPGwasanextensionofDPG(DeterministicPolicyGradient)tocontinuousactionspaces,usingDQN(DeepQ-Network)toestimatetheQ-function.Q-functionreferstoanaction-valuefunction.Itreflectsexpectedcumulativerewards.Itisamappingfromastate-actionpairtotheexpectedvalue.FormoreinformationondeepRLalgorithms,pleaserefertoAtashbarandShi(2022a).
DDPGalgorithmhasbeentheharbingerofmodernReinforcementLearningandhasbeenthelaunchpadforthedevelopmentofmanyotherinterestingRLalgorithms.OneoffshootofDDPGiscalledTD3(TwinDelayedDDPG)whichusesaclippeddouble-Qfunctionforlearningpolicies.AnotheroffshootisMADDPG(Multi-AgentDeepDeterministicPolicyGradient),whichisanextensionofDDPGtotheso-called"centralizedtrainingwith
∗
1Ratio𝑖
𝑐∗
=1−(ℎ∗)=0.28
𝑦∗
𝑦∗
ℎ∗
3ThestatevariablesdependonpastactionsoftheAIagent,asillustratedinthetransitionequationscellinTable3.Thefirstenvironmentisbothdeterministic(absenceofexogenousshocks)anddynamic.
4Fullalgorithmisattachedintheannex.Moreadvancedalgorithmsuchassoftactorcriticisalsodevelopedthatcanachieveamorestablelearning.
de
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
评论
0/150
提交评论