多元回归分析模型识别和数据问题_第1页
多元回归分析模型识别和数据问题_第2页
多元回归分析模型识别和数据问题_第3页
多元回归分析模型识别和数据问题_第4页
多元回归分析模型识别和数据问题_第5页
已阅读5页,还剩24页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

多元回归分析模型识别和数据问题第一页,共二十九页,2022年,8月28日contentsFunctionalformmisspecificationUsingproxyvariablesMeasurementerrorinvariablesMissingdataandOutlyingobservations第二页,共二十九页,2022年,8月28日Functionalfrommisspecification第三页,共二十九页,2022年,8月28日FunctionalFormWe’veseenthatalinearregressioncanreallyfitnonlinearrelationshipsCanuselogsonRHS,LHSorbothCanusequadraticformsofx’sCanuseinteractionsofx’sHowdoweknowifwe’vegottentherightfunctionalformforourmodel?第四页,共二十九页,2022年,8月28日FunctionalForm(continued)First,useeconomictheorytoguideyouY=AKaLbeuorlnY=lnA+alnK+blnL+uThinkabouttheinterpretationlog(wage)=b0+b1

educ+u,orlog(educ)asindependentvariableDoesitmakemoresenseforxtoaffectyinpercentage(uselogs)orabsoluteterms?Doesitmakemoresenseforthederivativeofx1tovarywithx1(quadratic)orwithx2(interactions)ortobefixed?第五页,共二十九页,2022年,8月28日FunctionalForm(continued)Wealreadyknowhowtotestjointexclusionrestrictionstoseeifhigherordertermsorinteractionsbelonginthemodellog(wage)=b0+

b1

educ+b2

exper+b3tenure+ulog(wage)=b0+

b1

educ+b2

exper+b3tenure+b4educ2+b5exper2+b6tenure2+b7educ•tenure+uItcanbetedioustoaddandtestextraterms,plusmayfindasquaretermmatterswhenreallyusinglogswouldbeevenbetterAtestoffunctionalformisRamsey’sregressionspecificationerrortest(RESET)Firstestimatelog(wage)=b0+

b1

educ+b2

exper+b3tenure+uGetfittedvalueŷ(log(wâge)

ofaboveequation)Then,considertheexpandedequationlog(wage)=b0+

b1

educ+b2

exper+b3tenure+d4ŷ2+d5ŷ3+uRESETistheFstatisticfortesingH0:d4=0,d5=0InStata,theRESETtestcommand:ovtestWhetherthemodely=b0+b1x1+…+bkxk+umisspecified?RESETreliesonatricksimilartothespecialformoftheWhitetestInsteadofaddingfunctionsofthex’sdirectly,weaddandtestfunctionsofŷSo,estimatey=b0+b1x1+…+bkxk+d1ŷ2+d1ŷ3+errorandtestH0:d1=0,d2=0usingF~F2,n-k-3orLM~χ22第六页,共二十九页,2022年,8月28日RESETtest,exampleHousingpriceequation(hprice.raw)price=b0+b1

lotsize+b2sqrft+b3bdrms+ulog(price)=b0+b1

log(lotsize)+b2

log(sqrft)+b3bdrms+uRESETtestprocedureEstimatethemodels:regpriceonlotsize,sqrft,bdrms,andgetfittedvalueofprice,ŷandSSRr=300723.806,n=88R2=0.6724Calculateŷ2,ŷ3,andplugthemtotheoriginalequation,andestimateit.Thatis,regpriceonlotsize,sqrft,bdrms,ŷ2,ŷ3,andSSRur=269983.825n=88R2=0.7059SotheFvalue=[(300723.806-269983.825)/2]/(269983.825/82)=4.6682,thep-value=0.012,therefore,wewillrejectthenullhypothesisthatthereisnomisspecification.Inthesameway,wecancalculatethesecondmodelF=[(2.86256385-2.69401081)/2]/(2.69401081/82)=2.565,p-value=0.0835.Sowecan’trejectthenullhypothesisatthe5%significance.第七页,共二十九页,2022年,8月28日Ifthemodelshavethesamedependentvariables,butnonnestedx’scouldstilljustmakeagiantmodelwiththex’sfrombothandtestjointexclusionrestrictionsthatleadtoonemodelortheother.Forexample,wehavetochoosemodelbetweeny=b0+b1x1+b2x2+u(m1)y=b0+b1log(x1)+b2log(x2)+u(m2)Whichmodeltochoose?

Method1:estimateacomprehensivemodely=d0+d1x1+d2x2+

d3log(x1)+d4log(x2)+uH0:d3=0,d4=0forthesecondmodelandH0:d1=0,d1=0forthefirstone.Method2:theDavidson-MackinnontestIf(m1)istrue,thenthefittedvaluesfrom(m2)shouldbeinsignificantin(m1).Thus,totest(m1),wefirstestimate(m2)byOLStoobtainthefittedvalues,ŷ.Thenplugitinto(m1),that’sy=b0+b1x1+b2x2+q

ŷ+uAsignificanttstatisticisarejectionofmodel(m1).NonnestedAlternativesTest第八页,共二十九页,2022年,8月28日Proxyvairables第九页,共二十九页,2022年,8月28日ProxyVariablesWhatifmodelismisspecifiedbecausenodataisavailableonanimportantxvariable?ItmaybepossibletoavoidomittedvariablebiasbyusingaproxyvariableModel:y=b0+b1x1+b2x2+b3x3*+uAproxyvariablemustberelatedtotheunobservablevariable–forexample:x3*=d0+d3x3+v3,where*impliesunobservedNowsupposewejustsubstitutex3forx3*第十页,共二十九页,2022年,8月28日ProxyVariables(continued)y=b0+b1x1+b2x2+b3x3*+ux3*=d0+d3x3+v3Whatdoweneedforforthissolutiontogiveusconsistentestimatesofb1andb2?Assumeuisuncorrelatedwithx1,x2andx3*,x3andv3isuncorrelatedwithx1,x2andx3E(x3*|x1,x2,x3)=E(x3*|x3)=d0+d3x3

Soreallyrunningy=(b0+b3d0)+b1x1+b2x2+b3d3x3+(u+b3v3)andhavejustredefinedintercept,errortermx3coefficient第十一页,共二十九页,2022年,8月28日Example:IQasaProxyforAbility(wage2.raw,p297)Modellog(wage)=b0+b1educ+b2exper

+b3abil+uAssumeE(u|educ,exper,abil)=0Butthedataofabilityisnotavailable,wethinkIQmaycorrelatewithability,that’sabil=d0+d1IQ+vAssumeE(v|educ,exper,IQ)=0soweuseIQasaproxyforability.Andtheestimatedmodelislog(wage)=b0*+b1educ+b2exper

+b3*IQ+u*Resultslog(wâge)=5.503+0.078

educ+0.0198exper

(biasedestimate)(0.112)(0.007)(0.003)n=935R2=0.1309

log(wâge)=5.198+0.057educ+0.0195exper

+0.0058IQ(0.122)(0.007)(0.003)(0.001)n=935R2=0.1622(efficientestimate)第十二页,共二十九页,2022年,8月28日ProxyVariables(continued)Withoutoutassumptions,canendupwithbiasedestimatesSayx3*=d0+d1x1+d2x2+d3x3+v3Thenreallyrunningy=(b0+b3d0)+(b1+b3d1)x1+(b2+b3d2)x2+b3d3x3+(u+b3v3)Biaswilldependonsignsofb3anddjThisbiasmaystillbesmallerthanomittedvariablebias,though第十三页,共二十九页,2022年,8月28日LaggedDependentVariablesWhatifthereareunobservedvariables,andyoucan’tfindreasonableproxyvariables?Maybepossibletoincludealaggeddependentvariabletoaccountforomittedvariablesthatcontributetobothpastandcurrentlevelsofy,thatis,usey-1toexplainy.y=b0+b1x1+b2x2+b3x3*+uy=b0+b1x1+b2x2+b3y-1+uObviously,youmustthinkpastandcurrentyarerelatedforthistomakesense第十四页,共二十九页,2022年,8月28日MeasurementError第十五页,共二十九页,2022年,8月28日MeasurementErrorSometimeswehavethevariablewewant,butwethinkitismeasuredwitherrorExamples:Asurveyaskshowmanyhoursdidyouworkoverthelastyear,orhowmanyweeksyouusedchildcarewhenyourchildwasyoungMeasurementerrorinydifferentfrommeasurementerrorinx第十六页,共二十九页,2022年,8月28日MeasurementErrorinaDependentVariableModely*=b0+b1x1+…+bkxk+uyistheobservablemeasureofy*.Definemeasurementerrorase0=y–y*Thus,reallyestimatingy=b0+b1x1+…+bkxk+u+e0WhenwillOLSproduceunbiasedresults?Ife0andxj,uareuncorrelatedisunbiasedIfE(e0)≠0then

b0willbebiased,thoughWhileunbiased,largervariancesthanwithnomeasurementerrorVar(u+e0)=su2+se2第十七页,共二十九页,2022年,8月28日MeasurementErrorinanExplanatoryVariabley=b0+b1x1*+uDefinemeasurementerrorase1=x1–x1*x1isthemeasureofthetruevaluex1*AssumeE(e1)=0,E(y|x1*,x1)=E(y|x1*)Reallyestimatingy=b0+b1x1+(u–b1e1)TheeffectofmeasurementerroronOLSestimatesdependsonourassumptionaboutthecorrelationbetween

e1andx1

SupposeCov(x1,e1)=0OLSremainsunbiased,varianceslarger第十八页,共二十九页,2022年,8月28日MeasurementErrorinanExplanatoryVariable(cont)SupposeCov(x1*,e1)=0,knownastheclassicalerrors-in-variablesassumption(CEV),thenCov(x1,e1)=E(x1e1)=E(x1*e1)+E(e12)=0+se2Seeestimatedmodely=b0+b1x1+(u–b1e1)x1iscorrelatedwiththeerrorsoestimateisbiased第十九页,共二十九页,2022年,8月28日MeasurementErrorinanExplanatoryVariable(cont)NoticethatthemultiplicativeerrorisjustVar(x1*)/Var(x1)SinceVar(x1*)/Var(x1)<1,theestimateisbiasedtowardzero–calledattenuationbiasIt’smorecomplicatedwithamultipleregression,butcanstillexpectattenuationbiaswithclassicalerrorsinvariables第二十页,共二十九页,2022年,8月28日MissingdataandOutlyingobservations第二十一页,共二十九页,2022年,8月28日MissingData–IsitaProblem?Ifanyobservationismissingdataononeofthevariablesinthemodel,itcan’tbeusedIfdataismissingatrandom,usingasamplerestrictedtoobservationswithnomissingvalueswillbefineAproblemcanariseifthedataismissingsystematically–sayhighincomeindividualsrefusetoprovideincomedata第二十二页,共二十九页,2022年,8月28日NonrandomSamplesIfthesampleischosenonthebasisofanxvariable,thenestimatesareunbiasedIfthesampleischosenonthebasisoftheyvariable,thenwehavesampleselectionbiasSampleselectioncanbemoresubtle第二十三页,共二十九页,2022年,8月28日OutliersSometimesanindividualobservationcanbeverydifferentfromtheothers,andcanhavealargeeffectontheoutcomeSometimesthisoutlierwillsimplybedotoerrorsindataentry–onereasonwhylookingatsummarystatisticsisimportantSometimestheobservationwilljusttrulybeverydifferentfromtheothers第二十四页,共二十九页,2022年,8月28日OutlierTest1

StudentizedResidualse(i)=yi–b(i)xi,where

b(i)representtheestimatedregressionslopewhentheithobservationhasbeenomitted.E(e(i))=0Thestudentizedresidualisei*=[yi–b(i)xi]/si(i)Where,si(i)isthestandarderroroftheregressionwithoutobservationi.Ifthestudentizedresidualsthataregreaterthan1.96inabsolutevaluecanberegardedasoutliersandshouldreceivespecialattention.InStata,it’seasytocalculatethestudentizedresiduals,youcanusethefollowingcommandafterregressionPredictrstud,rstudente(i)=yi–b(i)xixi第二十五页,共二十九页,2022年,8月28日OutlierinaModelofPublicSpending,(HR,Ex7.3)In“prdata\ex73.txt”exp=-45.698+3.234aid+0.00019inc-0.597popPredictrstud,rstudent/*calculatethestudentizedresiduals*/The7thand14thobservationmaybeoutlier.Omit7thobservationandestimateagainexp=-7.08+2.365aid+0.00018inc-0.426popDummyvariablemethodtocalculatestudentizedresidualsGend7

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论