人工智能与宏观经济 造型:深 强化学习 在 RBC 模型中_第1页
人工智能与宏观经济 造型:深 强化学习 在 RBC 模型中_第2页
人工智能与宏观经济 造型:深 强化学习 在 RBC 模型中_第3页
人工智能与宏观经济 造型:深 强化学习 在 RBC 模型中_第4页
人工智能与宏观经济 造型:深 强化学习 在 RBC 模型中_第5页
已阅读5页,还剩27页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

AIandMacroeconomicModeling:DeepReinforcementLearninginanRBCModel

TohidAtashbarandRui(Aruhan)ShiWP/23/40

IMFWorkingPapersdescriberesearchinprogressbytheauthor(s)andarepublishedtoelicitcommentsandtoencouragedebate.

TheviewsexpressedinIMFWorkingPapersarethoseoftheauthor(s)anddonotnecessarilyrepresenttheviewsoftheIMF,itsExecutiveBoard,orIMFmanagement.

2023

FEB

©2023InternationalMonetaryFund WP/23/40

IMFWorkingPaper*

Strategy,PolicyandReviewDepartment

AIandMacroeconomicModeling:DeepReinforcementLearninginanRBCModelPreparedbyTohidAtashbarandRui(Aruhan)Shi

AuthorizedfordistributionbyStephanDanningerFebruary2023

IMFWorkingPapersdescriberesearchinprogressbytheauthor(s)andarepublishedtoelicitcommentsandtoencouragedebate.TheviewsexpressedinIMFWorkingPapersarethoseoftheauthor(s)anddonotnecessarilyrepresenttheviewsoftheIMF,itsExecutiveBoard,orIMFmanagement.

ABSTRACT:Thisstudyseekstoconstructabasicreinforcementlearning-basedAI-macroeconomicsimulator.WeuseadeepRL(DRL)approach(DDPG)inanRBCmacroeconomicmodel.Wesetuptwolearningscenarios,oneofwhichisdeterministicwithoutthetechnologicalshockandtheotherisstochastic.Theobjectiveofthedeterministicenvironmentistocomparethelearningagent'sbehaviortoadeterministicsteady-statescenario.Wedemonstratethatinbothdeterministicandstochasticscenarios,theagent'schoicesareclosetotheiroptimalvalue.Wealsopresentcasesofunstablelearningbehaviours.ThisAI-macromodelmaybeenhancedinfutureresearchbyaddingadditionalvariablesorsectorstothemodelorbyincorporatingdifferentDRLalgorithms.

RECOMMENDEDCITATION:Atashbar,T.andShi,R.A.2023.“AIandMacroeconomicModeling:DeepReinforcementLearninginanRBCmodel”,IMFWorkingPapers,WP/22/40.

JELClassificationNumbers:

C63,C54;D83;D87;E37

Keywords:

Reinforcementlearning;Deepreinforcementlearning;Artificialintelligence,RL;DRL;Learningalgorithms;Macromodeling,RBC;Realbusinesscycles;DDPG;Deepdeterministicpolicygradient;Actor-criticalgorithms

Author’sE-MailAddress:

tatashbar@;

ashi@

*TheauthorswouldliketothankStephanDanningerforhishelpfulcommentsandsuggestions.WeappreciatetheviewsandsuggestionsprovidedbyMicoMrkaic,DmitryPlotnikov,SergioRodriguezandattendeesattheIMFSPRMacroPolicyDivisionBrownbagSeminar.CommentsbyAllanDizioliarealsogratefullyacknowledged.Allerrorsremainourown.

WORKINGPAPERS

AIandMacroeconomicModeling:DeepReinforcementLearninginanRBCModel

PreparedbyTohidAtashbarandRui(Aruhan)Shi

IMFWORKINGPAPERS

TitleofWP

INTERNATIONALMONETARYFUND

PAGE

2

Contents

GLOSSARY 3

INTRODUCTION 4

ANOVERVIEWOFTHELITERATURE 5

AREALBUSINESSCYCLE(RBC)MODEL 8

Households 8

Firms 9

Functionalformsandparameters 10

Adeterministicsteadystate 10

AIEXPERIMENTS 11

ExperimentI:deterministicenvironment 15

ExperimentII:stochasticenvironment 19

Issuesduringlearning 22

CONCLUSION 24

ANNEXI.DDPGALGORITHM 26

REFERENCES 27

FIGURES

Figure1.SL,ULandRLinML 7

Figure2Laborhoursduringtraining(200episodes) 17

Figure3Laborhourseriesduringtrainingandtesting 17

Figure4Distancethesteadystate(SS)valuesforlaborhourandconsumption 18

Figure5Productivityshockserieszt 19

Figure6Simulatedseriesduring100testingperiods 20

Figure7Laborhourchoicebeforeandafterlearning(200episode) 21

Figure8Distancetodeterministicsteadystates(SS)forlaborhourandconsumption 22

Figure9Distancetodeterministicsteadystates(SS)foroutputandinvestment 22

Figure10Outputperunitoflabor 23

Figure11Investmentperunitoflabor 24

TABLES

Table1.BaselineparametersforRBCmodel 10

Table2Algorithmrelatedparameters 13

Table3RLsetupoftheRBCmodel 15

Glossary

AGI ArtificialGeneralIntelligenceAI ArtificialIntelligence

ANN ArtificialNeuralNetworks

DDPGDeepDeterministicPolicyGradientDL Deeplearning

DNN DeepNeuralNetwork

DPG DeterministicPolicyGradientDQN DeepQ-Network

DRL DeepReinforcementLearning

MADDPG Multi-AgentDeepDeterministicPolicyGradientRBC RealBusinessCycle

RL ReinforcementLearningSAC SoftActor-Critic

SL SupervisedLearningTD3 TwinDelayedDDPGUL UnsupervisedLearning

Introduction

Macroeconomicmodelingistheprocessofconstructingamodelthatdescribesthebehaviorofamacroeconomicsystem.Thisprocesscanbeusedtodeveloppredictionsaboutthefuturebehaviorofthesystem,tounderstandtherelationshipsbetweendifferentvariablesinthesystem,ortosimulatebehavior.

Artificialintelligence(AI)isabranchofcomputersciencethatdealswiththedesignanddevelopmentofintelligentcomputersystems.AIresearchdealswiththequestionofhowtocreateprogramsthatarecapableofintelligentbehavior,i.e.,thekindofbehaviorthatisassociatedwithhumanbeings,suchasreasoning,learning,problem-solving,andactingautonomously.

Thetwofieldscouldbeconceptuallycombined,asAItechniquescouldbeusedtodevelopmoreaccuratemacroeconomicmodels,oronecouldusemacroeconomicmodelstohelpdesignartificialgeneralintelligentsystemsthatarebetterabletosimulateeconomic(ormorebroadlysocial)behaviors,amongmanyothertasks.AIcanbeusedtoautomaticallyidentifyrelationshipsbetweenvariables,ortodevelopnewwaysofrepresentingeconomicsystems.AIcanalsobeusedtodevelopmethodsforautomaticallylearningfromdata,whichcanbeusedtoimprovetheaccuracyofpredictions.AIalsocouldbeusedtodevelopmoresophisticatedmodelsthattakeintoaccountawiderrangeoffactors,includingnon-economicfactorssuchaspoliticalinstabilityorweatherpatterns.

Anincreasingbodyofworkleveragesmachinelearningforforecasting(AtashbarandShi,2022),besidessomerecentdevelopmentsinoptimization,marketdesign,andalgorithmicgametheory,butAI'simpactoneconomics,especiallyinthefieldofmacroeconomicmodeling,hasbeenmodestsofar.Thishasbeencausedbyacombinationoffactorsincludingtherelativelynewnessofthefield,thedifficultyofdesigningAIagentscapableofrealisticallyimitatinghumanbehaviorinaneconomy,thelackofdataavailablefortrainingAImodels,andthelackofcomputationalresourcesneededtotrainandrunlargemacroeconomicsimulations.

ButwiththeemergenceofanewgenerationofAImodelscalledreinforcementlearning(RL),there'sagrowingbeliefthatAIwillhaveatransformativeimpactonmacroeconomicmodeling(Tilbury,2022).ThisisprimarilybecauseRLmodelsaremuchbettersuitedthanpreviousAImodelsforimitatinghumanbehavior.Inaddition,RLmodelsrequiremuchlessdatatobetrained(theygeneratetheirowndatathroughinteractionwiththeirenvironment)andcouldbemuchmoreefficientintermsofcomputationalresourcesinspecificsettingsoralgorithms.

ThegoalofthispaperistobuildarelativelysimpleandextendablemacroeconomicmodelbasedonRLthatcangeneraterealisticmacroeconomicdynamicsthatarecomparabletomodelsundertherationalexpectationsassumptionwhilenotimposingunrealisticrestrictionslikeperfectforesightoneconomicagents.Theresultingmodelwillbeusedasaprototypeforfutureextensionsinpolicyexperimentortocustomizeittobettermatchtheconditions,shocksordataofaparticularorglobaleconomy.

Tothisend,weimplementanadvanceddeepRL(DRL)algorithm(thedeepdeterministicpolicygradient(DDPG))inarealbusinesscycle(RBC)macroeconomicmodel.WechosetheDDPGalgorithmforthisbasicmodel(withaneyeonthepossibleextensionsofthemodelinthefuture)forseveralreasons(SuttonandBarto(2018),GraesserandKeng(2019),ZaiandBrown(2020)andPowell(2021)):

First,itisoneofthemodernRLalgorithmsthatcanbeappliedtocontinuousactionspaceproblems,whichiscrucialformodelingmacroeconomicvariables.Second,itisoneoftheRLalgorithmsthatcanhandlehigh-dimensionalstateandactionspaces,whicharetypicalinmacroeconomicmodels(e.g.,thenumberofdifferenteconomicsectors).Third,theseparationofpolicyandvaluefunctionsinthealgorithmallowsforanalyzingeachcomponentindependentlyduringthelearningprocess.Fourth,theDDPGalgorithmisoneofthefewRLalgorithmsthatcanbeappliedtonon-stationaryproblems,whicharecommoninmacroeconomicmodeling.

Fifth,itisoneofthefewRLalgorithmsthatcanbeappliedtoproblemswithaverylong-timehorizon,whichmightbeimportantformacroeconomicmodeling.Sixth,theDDPGalgorithmisoneofthefewRLalgorithmsthatcanbeapplied,inspecificsettings,topartiallyobservableMarkovdecisionprocess(POMDP)problemsor,

inotherwords,toproblemswithalimitedobservationwindoworlimitedinformationsettings.Thiscouldbeimportantforsomemacroeconomicmodelingworkssincetheobservationwindowisoftenlimitedbythefrequencyofthedata.Finally,TheDDPGalgorithmhasbeenshowntoperformwellinavarietyofchallengingproblemsintheRLliterature.However,similartootherRLalgorithms,theDDPGalgorithmisalsoknowntobeunstableinsomesettingsandcandivergeifthelearningprocessisnotproperlytuned.

WefindthattheRLaugmentedRBCmodelperformssimilartotheRBCmodelundertherationalexpectationsassumptiononcethelearningrepresentativeagenthaslearntformanysimulationperiods.Thisisachievedfromthestagewhentherepresentativeagentdoesnotunderstandtheeconomicstructure,itspreferenceorhowtheeconomytransitionsovertime.However,thetrainingtakesasignificantamountofsimulationperiods,inpartduetothemechanismthattheagentneedstogenerateitsownexperiencetolearnfromit.Tosimulaterealistichouseholds’behaviorsthatmatchempiricallearningperiods,furtherworkisneededtocalibratetheparameters,ortransferpastexperiencetothelearningagentasastartingpointoflearning.

Theseencouragingresultsneedtobeputinperspective.Inadditiontotherudimentary(butextendible)characterofourmodelstructure,adisadvantageofourworkisalsotherestrictedscopeoftheRBCmodels.Thebusinesscyclevariationsareonlypropagatedthroughanexogenousproductivityshock.TheempiricallyimpliedmagnitudeoftruetechnologyshockislikelytobesmallerthanwhattheRBCmodelspredict.

Unemploymentisalsoexplainedinanoverlysimplifiedmanner:intertemporalsubstitutionsbetweenlaborandleisureexplainsemploymentvariations.Forworkerstogainhighutility,itisbettertoworkmoreinproductiveperiods,andlessinunproductiveperiods.However,RBCmodelsarethecorecomponentoftheDSGEmodelsthatarelargelyappliedinpolicyinstitutionsandcentralbanks.Itisscalableandeasilybuilton.Itiswellknownandstudied,andthuseasytocomparelearningresultswithexistingtheory.

WehopethisworkwillencouragefurtherresearchintheapplicationofAIanddeepRLformacroeconomicproblemsandwillopenupanewdirectionofresearchtocombinedeepRLwithstandardmacroeconomicmodels.Inparticular,weexpectittobeabaseandextensionformoreadvancedapplicationsattheFundthatexploretheuseofdeepRLformacroeconomicpolicyanalysis.

Therestofthepaperisorganizedasfollows.SectionIprovidesabriefliteraturereviewofAIandRL/deepRLapplicationsinmacroeconomicpolicy.SectionIIdescribestheRBCmodel.SectionIIIintroducestheDRLalgorithm,theenvironmentandtheAIexperimentsweconduct,theresults,andtheissuesduringlearning,andSectionIVconcludes.

Anoverviewoftheliterature

Artificialintelligence(AI)isagrowingfieldofcomputersciencefocusedoncreatingintelligentcomputersystems,ormachines,thatcanreason,learn,andactautonomously.AIsystemsaredesignedtomimichumancognitiveabilities,suchaslearning,problemsolving,andnaturallanguageprocessing.

Theterm“artificialintelligence”wasfirstcoinedin1956bycomputerscientistJohnMcCarthy(Andresen,2002).AIresearchishighlyinterdisciplinary,involvingdisciplinessuchascomputerscience,psychology,neuroscience,linguistics,philosophy,andanthropology.

TherearethreebroadcategoriesofAIsystems(Goertzel,2007):

NarrowAIorweakAIsystemsaredesignedtoperformaspecifictask,suchasfacialrecognitionormodelfinancialmarkets.

GeneralAIorstrongAIsystemsaredesignedtoperformawiderangeoftasks,suchasreasoningandplanning.

SuperAIorartificialgeneralintelligence(AGI)arehypotheticalAIsystemsthatmatchorexceedhumanintelligence.

AIisalreadybeingheavilyusedacrossmultiplefieldsandindustriesincludinghealthcare,retail,finance,imageprocessing,autonomousdriving,andmanymore.TheapplicationofAIineconomicsisstillinitsearlystagesandhasyettobesufficientlydevelopedinitsapplication.Nonetheless,sometheorizethatsoonerorlater,AI-economistmachinescouldcatchupwiththehumaneconomistsinmanyareas(Atashbar,2021a,2021b).AIhasbeenusedineconomicsmostlyforpredictionsandforecasts,marketanalysisandtheimpactanalysisofalternativepolicies.Lu&Zhou(2021),Ruiz-Realet.al.,(2021),Goldfarbet.al.,(2019),Cao(2020),andVelosoetal.,(2021)lookathowAIis/couldbeusedineconomicsandfinance.

Machinelearning(ML)isabranchofartificialintelligencethatusesartificialneuralnetworks(ANN)tolearnfromdata,withoutbeingexplicitlyprogrammed.ANNisadata-drivenapproachtomachinelearningthatisbasedontheideaofartificialneurons,ornodes,thatareconnectedinlayers.Theinputlayerreceivestheinputdata,andtheoutputlayerproducestheoutput.Thehiddenlayersinbetweenperformthelearningbyadjustingtheweightsoftheconnectionsbetweenthenodes.Deeplearning(DL)isasubsetofmachinelearningthatusesadeepneuralnetwork(DNN)tomodelcomplexpatternsindata.ADNNisanANNwithadeeparchitecture.Thismeansthattheneuralnetworkcontainsnotonlyaninputlayerandanoutputlayer,butalsooneormorelayersinbetweentoaddfurthernon-linearitiesinordertorecognizecomplexpatternsinadataset.

Therearethreegeneralapproachestothelearningprocessesinmachinelearning:

Supervisedlearning(SL):Themachineisprovidedwithasetoftrainingdata,whichincludesboththeinputdataandthedesiredoutput.Thedataislabeled.Themachineisthenabletolearnandgeneralizefromthisdatainordertoproducethedesiredoutputfornewdata.Themainapplicationsofsupervisedlearningareclassification,regression,andprediction.

Unsupervisedlearning(UL):Themachineisprovidedwithasetofinputdata,butnotthedesiredoutput.Theinputisnotlabeled.Themachinemustthenlearntofindpatternsandrelationshipsinthedatainordertoproducethedesiredoutput.Semi-SupervisedLearningcombinessupervisedandunsupervisedlearning.Thismeansthatthetrainingdatasetcontainsbothlabelleddata(i.e.,everypieceofinputdataisattachedtoadesiredoutput)andunlabeleddata(i.e.,inputdataisnotattachedtoadesiredoutput).Themainapplicationsofunsupervisedlearningareclustering,dimensionalityreduction(e.g.,principalcomponents),andassociationrulelearning.

Reinforcementlearning(RL):Itisdifferentfrombothsupervisedandunsupervisedlearninginthatitisnotgivenasetoftrainingdata.Themachineisgivenasetofrulesorobjectives,anditmustlearnhowtobestachievetheseobjectivesthroughrepeatedinteractionswiththeenvironment.Themainapplicationsofreinforcementlearningarecontrol,robotics,optimizationandgaming.

Figure1.SL,ULandRLinML

Source:authors’construction

Deepreinforcementlearning(DRL)isamachinelearningtechniquethatcombinesreinforcementlearning(RL)withdeeplearning(DL),meaningthatitusesaDNNtorepresenttheRLagent(Li,2017).ThisapproachisusedtosolveproblemsthataretoodifficultforsimpleRLalgorithmsalone.ForanintroductionofthetheoryandseveralalgorithmsinRL/RDL,seeAtashbarandShi(2022a).

SurveysbyAthey(2018),Cameron(2019),Nosratabadietal.,(2020)andHull(2021)provideacomprehensivereviewofthemethodsandusecasesofMLandDLineconomics.TheapplicationofRLandDRLineconomicshasarelativelyshorthistoryandisinitsearlystages.Theliteratureondeepreinforcementlearningineconomicsmainlyfocusesontheapplicationofdeepreinforcementlearninginmicroeconomicsettings.

Reinforcementlearninghasbeenappliedtovariouseconomicproblems,suchasdynamicpricinginelectricitymarkets,auctiontheory,portfoliomanagementandassetpricing.

Fengetal.(2018)modeltherulesofanauctionasaneuralnetworkandusedeeplearningfortheautomateddesignofoptimalauctions.Theydiscovernewauctionswithhighrevenueformulti-unitauctionswithprivatebudgets,includingproblemswithunit-demandbidders.Zhengetal.(2020)employreinforcementlearningtoexamineanddecideontheactionsofagentsandasocialplannerinagather-and-buildenvironment.TheydemonstratethatAI-driventaxpoliciesenhancethetrade-offbetweenequalityandproductivityoverbaselinepolicies.

Düttingetal.(2021)modelanauctionasamulti-layerneuralnetwork,frameoptimalauctiondesignasaconstrainedlearningproblem,andshowhowitcanbesolvedusingstandardmachinelearningpipelines.Theydemonstrategeneralizationlimitsanddescribeextensiveexperiments,recoveringessentiallyallknownanalyticalsolutionsformulti-itemsettings,andproposenewmechanismsforsettingsinwhichtheoptimalmechanismisunknown.

Whilebeinglimited,thereisalsoagrowingbodyofliteratureontheapplicationofreinforcementlearningtomacroeconomicmodels.Inmacroeconomicsliterature,DeepRLalgorithmshavelargelybeenusedinafewdomains.Oneofthemistousereinforcementalgorithmstofindtheoptimalpossiblepolicyorpolicyresponsefunction,asinHinterlangandTänzer(2022)andCovarrubias(2022).Thisfieldencompassesgeneralequilibriummodelsolvingaswell,asdemonstratedbyCurryetal(2022).

ThelearnabilityofrationalexpectationsolutionsinageneralequilibriummodelwithmultipleequilibriaisalsoatopicChenetal(2021)study.Byusingarepresentativeagentwithnumerousequilibriainamonetarymodel,

theydemonstratethattheRLagentcanlocallyconvergetoallofthestablestatesthatthemonetarymodeldescribes.

Modelingrationalityandboundedrationalityisanotherareaofemphasis.Hilletal.(2021)demonstratehowtosolvethreerationalexpectationsequilibriummodelsusingdiscreteheterogeneousagentsasopposedtoacontinuumofagentsorasinglerepresentativeagent.Shi(2021)investigatesRLagents'consumption-savingbehaviorinastochasticgrowthsetting.ShefocusesonthedifferencesinlearningbehaviorsthatoccurwhenRLagentsvaryintermsoftheirexplorationlevels,andhowthisaffectstheconvergenceofoptimumpolicy.

Similartopreviousresearch,ourworkaddsadditionalevidencetestingaDRLalgorithminamacroeconomicmodel.However,weimplementarepresentativeDRLagentinanRBCmodel,whichisservedasafundamentalbuildingblockforthecommonlyusedNewKeynesianDSGEmodels.

ARealBusinessCycle(RBC)Model

ThebaselineRBCmodelcontainsidenticalandinfinitelylivedhouseholdsandfirms.Thebusinesscyclefluctuationsaregeneratedbyrealshocks,i.e.,atechnologyshocktoproductivity.Inthisspecification,thehouseholdsownthefirmsandrentoutcapital.Thefirmsissuebothdebt(bonds)andequity(dividend).

Households

Ahouseholdmakesconsumption-savingandwork-leisuredecisions.Hemaximizesexpectedutility:

subjecttotheconstraints:

𝐸0∑𝛽𝑡𝑢(𝑐𝑡,1−ℎ𝑡)

𝑡=0

𝑥𝑡+𝑐𝑡+𝑏𝑡+1≤𝑤𝑡ℎ𝑡+𝑟𝑡𝑘𝑡+𝑅𝑡𝑏𝑡+Π𝑡

𝑘𝑡+1≤(1−𝛿)𝑘𝑡+𝑥𝑡

𝑘𝑡≥0

𝑘0isgivenandthemaximizationalsosatisfiesthetransversalitycondition.

𝑐𝑡denotesconsumption,𝑥𝑡denotesinvestment,𝑏𝑡+1denotesbondholding,𝑤𝑡hourlywagerate,ℎ𝑡denoteshoursworked,𝑟𝑡denotesreturnoncapital,𝑘𝑡denotescapital,𝑅𝑡denotesinterestrateonbondholding,andΠ𝑡denotesdividendpayment.

ℎ𝑡∈[0,1]inperiod𝑡,andtheconsumerreceivesutilityfromleisure.

Thechoicestheconsumermakesattime𝑡are(𝑥𝑡𝑜𝑟𝑘𝑡+1,𝑐𝑡,𝑏𝑡+1,ℎ𝑡),giventime𝑡informationandtheinterestrateonbonds,𝑅𝑡+1.

OptimizationunderRationalExpectation

ThissectionaswellasSectionB.1derivetheoptimizationconditionsundertherationalexpectationsassumption.TheaimistocomparelearningresultsofaDRLRBCmodelwiththerationalexpectationssolution.InimplementingaDRLalgorithm,thefirstorderconditionsincludingtheEulerequationarenotrequired.Therepresentativehousehold’sLagrangianis,

𝐿=𝐸0∑𝛽𝑡{𝑢(𝑐𝑡,ℎ𝑡)+𝜆𝑡(𝑤𝑡ℎ𝑡+𝑟𝑡𝑘𝑡+𝑅𝑡𝑏𝑡+Π𝑡−𝑐𝑡−𝑘𝑡+1+(1−𝛿)𝑘𝑡−𝑏𝑡+1)}

𝑡=0

Thefirstorderconditionsare:

𝜕𝐿=0↔𝑢𝑐(𝑐,ℎ)=𝜆

𝜕𝑐𝑡

𝑡𝑡 𝑡

𝜕𝐿=0↔𝑢ℎ(𝑐,ℎ)=𝜆𝑤

𝜕ℎ𝑡

𝑡𝑡

𝑡𝑡

𝜕𝐿

𝜕𝑘𝑡+1

=0↔𝜆𝑡=𝛽𝐸𝑡𝜆𝑡+1{𝑟𝑡+1+(1−𝛿)}

𝜕𝐿

𝜕𝑏𝑡+1

=0↔𝜆𝑡=𝛽𝐸𝑡𝜆𝑡+1(𝑅𝑡+1)

𝜕𝐿

𝜕𝑐𝑡

𝜕𝐿

𝜕𝑐𝑡

and𝜕𝐿

𝜕𝑘𝑡+1

and𝜕𝐿

𝜕𝑏𝑡+1

yield:

yield:

𝑢𝑐(𝑐𝑡,ℎ𝑡)=𝛽𝐸𝑡𝑢𝑐(𝑐𝑡+1,ℎ𝑡+1)(𝑟𝑡+1+1−𝛿)

𝑢𝑐(𝑐𝑡,ℎ𝑡)=𝛽𝐸𝑡𝑢𝑐(𝑐𝑡+1,ℎ𝑡+1)𝑅𝑡+1

𝜕𝐿

𝜕𝑐𝑡

and𝜕𝐿yield:

𝜕ℎ𝑡

𝑢ℎ(𝑐𝑡,ℎ𝑡)=𝑤𝑡𝑢ℎ(𝑐𝑡,ℎ𝑡)

Firms

Aprofitmaximizingfirm’sproblemis:

max𝑒𝑧𝑡𝐹(𝐾𝑡,𝐻𝑡)−𝑤𝑡𝐻𝑡−𝑟𝑡𝐾𝑡

Kt,Ht

where𝐾𝑡isthecapitalinput,𝐻𝑡isthelabourinput,Fisaneoclassicalproductionfunction,suchastheCobb-Douglasproductionfunction,𝑧𝑡followsanAR(1)processasfollows.

𝜖𝑡issampledfromawhitenoiseprocess.

𝑧𝑡=𝜌𝑧𝑡−1+𝜖𝑡

OptimizationunderRationalExpectations

Thefirmsfirstorderconditionsgivewagerateandcapitalrentalrateequations:

𝑤𝑡=𝑒𝑧𝑡𝐹𝐾(𝐾𝑡,𝐻𝑡)

𝑟𝑡=𝑒𝑧𝑡𝐹𝐻(𝐾𝑡,𝐻𝑡)

IMFWORKINGPAPERS

TitleofWP

INTERNATIONALMONETARYFUND

PAGE

10

Thedebtthefirmissuesisindeterminateinthissetup.

Functionalformsandparameters

Table1presentsbaselineparametersfollowCooleyandPrescott(1995)fortheUSdata.

Table1.BaselineparametersforRBCmodel

Description

Parametervalue

Relevantequations

Utilityfunctionparameters

𝜒=1(logarithmicutility)

𝛼=0.64

(𝑐1−𝛼(1−ℎ)𝛼)1−𝜒

𝑢(𝑐,ℎ)= 𝑡 𝑡

𝑡𝑡 1−𝜒

𝑢(𝑐𝑡,ℎ𝑡)=(1−𝛼)ln𝑐𝑡+𝛼ln(1−ℎ𝑡)

Productionfunction

𝜃=0.4

𝐹(𝐾,𝐻)=𝐾𝜃𝐻1−𝜃

Discountrate–𝛽

0.99

𝐸0∑𝛽𝑡𝑢(𝑐𝑡,1−ℎ𝑡)

𝑡=0

Autoregressiveparameter-𝜌

0.95

𝑧𝑡=𝜌𝑧𝑡−1+𝜖𝑡

Standarddeviationof𝜖𝑡,𝜎𝜖

0.007

𝑧𝑡=𝜌𝑧𝑡−1+𝜖𝑡

Capitaldepreciation–𝛿

0.012

𝑘𝑡+1≤(1−𝛿)𝑘𝑡+𝑥𝑡

Adeterministicsteadystate

AssumefortheparametervaluesandfunctionalformspresentedinsectionC.Atadeterministicsteadystate,

𝑧𝑡=0,𝑘𝑡+1=𝑘𝑡=𝑘∗,𝑐𝑡+1=𝑐𝑡=𝑐∗.ThefirstorderconditionsinsectionA.1andB.1becomethefollowingsteadystateconditions:

1 𝑘∗

𝜃−1

𝑘∗

1−1+𝛿

𝛽

1

𝜃−1

𝛽−1+𝛿=𝜃(ℎ∗)

↔ℎ∗=(

) (=124.7)

𝜃

𝑘∗𝜃

𝑦∗=()ℎ∗

ℎ∗↔

𝑘∗

𝑦∗

ℎ∗

𝑘∗𝜃

=(ℎ∗)

𝑖∗

(=6.89)

𝑘∗

𝑖∗=𝛿𝑘∗=𝛿(

ℎ∗

)ℎ∗↔

ℎ∗

=𝛿(

ℎ∗

)(=1.5)

IMFWORKINGPAPERS

TitleofWP

INTERNATIONALMONETARYFUND

PAGE

11

Theaccountingidentitygivesthevalueofconsumption1:

𝑐∗

𝑦∗

𝑖∗

𝑐∗=𝑦∗−𝑖∗↔

ℎ∗

=ℎ∗

−ℎ∗

(=5.39)

ThevaluesinparenthesisaresteadystatevaluescalculatedbasedontheparameterspresentedinTable1.

𝑘∗𝑦∗𝑐∗𝑖∗

Thesteadystatevaluescanbecalculatedforallrealvariablesperunitoflaborinput,i.e.,ℎ∗,ℎ∗,ℎ∗,ℎ∗.Wagerateandcapitalrentalrateareasfollows.

𝑘∗𝜃

𝑤∗=(1−𝜃)()

ℎ∗

𝑟∗=𝜃(

𝑘∗

ℎ∗

𝜃−1

)

−𝛿

AIExperiments

ThefollowingsimulationsdemonstratethelearningbehaviorsofarepresentativeRLagentandtheeconomicdynamics.Wefirstcomparetheagent’sdecisions(e.g.,choiceoflaborhour)atthebeginningofalearningprocesswiththesameagent’sdecisionsaftermanysimulationperiodsoflearning.Thisistoshowthattheagent’sprogressoflearninginanunknownenvironmentfollowingtheframeworkoflearningfrompastitsownexperience.Wethencomparethelearningagent’sdecisionswithwhatarationalexpectationsagentwouldmakeinthesameenvironment.WealsoplotseriesofmacroeconomicvariablestoshowthattheRBCmodelwithaRLagentmakessimilarqualitativepredictionstoaconventionalRBCmodel.

Wesetuptwoenvironments,oneisadynamic3anddeterministicenvironmentwithoutanyshocks,andtheotherisstochasticwithtechnologyshocks.ThisistofirstofferaclearcomparisonofRLagent’sbehaviorswitharationalexpectationsagentinadeterministicenvironment.Asmostmacroinsightsarederivedfromstochasticmodels,wethenhighlightthattheRLagentbehavesandlearnswellinastochasticenvironmentaswell.

Implementation

WeimplementDDPGalgorithminthispaper.ItisfirstintroducedbyLillicrapetal(2015)4inthepaper"ContinuousControlwithDeepReinforcementLearning".ThealgorithmwasdesignedtoimprovetheissueofapplyingRLmethodstocontinuousactionspaces.ThemainideabehindDDPGistouseaDNNtoapproximatetheaction-valuefunction.DDPGwasanextensionofDPG(DeterministicPolicyGradient)tocontinuousactionspaces,usingDQN(DeepQ-Network)toestimatetheQ-function.Q-functionreferstoanaction-valuefunction.Itreflectsexpectedcumulativerewards.Itisamappingfromastate-actionpairtotheexpectedvalue.FormoreinformationondeepRLalgorithms,pleaserefertoAtashbarandShi(2022a).

DDPGalgorithmhasbeentheharbingerofmodernReinforcementLearningandhasbeenthelaunchpadforthedevelopmentofmanyotherinterestingRLalgorithms.OneoffshootofDDPGiscalledTD3(TwinDelayedDDPG)whichusesaclippeddouble-Qfunctionforlearningpolicies.AnotheroffshootisMADDPG(Multi-AgentDeepDeterministicPolicyGradient),whichisanextensionofDDPGtotheso-called"centralizedtrainingwith

1Ratio𝑖

𝑐∗

=1−(ℎ∗)=0.28

𝑦∗

𝑦∗

ℎ∗

3ThestatevariablesdependonpastactionsoftheAIagent,asillustratedinthetransitionequationscellinTable3.Thefirstenvironmentisbothdeterministic(absenceofexogenousshocks)anddynamic.

4Fullalgorithmisattachedintheannex.Moreadvancedalgorithmsuchassoftactorcriticisalsodevelopedthatcanachieveamorestablelearning.

de

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论