多元回归分析异方差问题

上传人：3*** IP属地：湖北上传时间：2022-03-23 格式：PPT 页数：39 大小：124KB 积分：25 举报 版权申诉

已阅读5页，还剩34页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、1第七章多元回归分析异方差问题的处理2contentsnWhats heteroskedasticity?nWhy worry about heteroskedasticity?nHow to test the heteroskedasticity?nCorrections for heteroskedasticity?3Whats heteroskedasticity?4What is HeteroskedasticitynRecall the assumption of homoskedasticity implied that conditional on the explanatory

2、 variables, the variance of the unobserved error, u, was constantvar(u|x)=s2 (homoskedasticity)nIf this is not true, that is if the variance of u is different for different values of the xs, then the errors are heteroskedasticvar(ui|xi)=si2(heteroskedasticity)nExample:if we examine a cross section o

3、f firms in one industry, error terms associated with very large firms might have larger variances than those error terms associated with smaller firms; sales of larger firms might be more volatile than sales of smaller firms.Consider a cross-section study of family income and expenditures. It seems

4、plausible to expect that low income individuals would spend at a rather steady rate, while the spending patterns of high income families would be relatively volatile.5.x x1x2yf(y|x)Example of Heteroskedasticityx3.E(y|x) = b0 + b1x6Patterns of heteroskedasticity7Why Worry About Heteroskedasticity?8Wh

5、y Worry About Heteroskedasticity?nOLS is still unbiased and consistent, even if we do not assume homoskedasticity The R2 and adj-R2 are unaffected by heteroskedasticity.nThe standard errors of the estimates are biased if we have heteroskedasticityThe OLS estimates arent efficient, thats the variance

6、s of the estimates are not the smallest variances.nIf the standard errors are biased, we can not use the usual t statistics or F statistics or LM statistics for drawing inferences2,or limjjjjiijjjiEPxx uEExxbbbbbbb9How to test the heteroskedasticity?10Testing for HeteroskedasticityGolfeld-Quandt Tes

7、t11Testing for HeteroskedasticityGolfeld-Quandt TestnEssentially want to test H0: Var(u|x1, x2, xk) = s2, which is equivalent to H0: E(u2|x1, x2, xk) = E(u2) = s2nH1: si2 = cxi2.nGoldfeld-Quandt test procedure:Order the data by the magnitude of the independent variable x, which is thought to be rela

8、ted to the error variance.Omit the middle d observations. d might be chosen, for example, to be approximately 1/5 of the total sample size.Fit the two separate regressions, the first for the portion of the data associated with low values of x and the second associated with high values of x. each reg

9、ression will involve (n-d)/2 pieces of data and (n-d)/2-k-1 degrees of freedom.Calculate the residual sum of squares associated with each regression: SSR1 associated with low xs and SSR2 associated with high xs.The statistic SSR2 /SSR1 will be distributed as an F statistic with n-d-2(k+1)/2 degress

10、of freedom in both the numerator and the denominator.12Example:Goldfeld-Quandt Test, (HR: Ex6.2, 154)nInsheet using pathex61.txtnsort incnreg hexp inc if inc=15, get SSR2=2.024, n1=n2=10, k+1=2nForm statistic F=SSR2/SSR1=6.7467nThe critical value F8,8=3.438nSo we reject the null hypothesis and commi

11、t that the data are heteroskedasticity.13Testing for HeteroskedasticitynEssentially want to test H0: Var(u|x1, x2, xk) = s2, which is equivalent to H0: E(u2|x1, x2, xk) = E(u2) = s2nIf assume the relationship between u2 and xj will be linear, can test as a linear restrictionnSo, for u2 = d0 + d1x1 +

12、 dk xk + v, this means testing H0: d1 = d2 = = dk = 014The Breusch-Pagan Test nDont observe the error, but can estimate it with the residuals from the OLS regressionregress y on x1,x2,xk. We get the residual inAfter regressing the residuals squared on all of the xs, can use the R2 to form an F or LM

13、 testregress 2 on x1,x2,xk. And test the joint zero hypotheses of the regressors.nThe F statistic is just the reported F statistic for overall significance of the regression, F = R2/k/(1 R2)/(n k 1), which is distributed Fk, n k - 1nThe LM statistic is LM = nR2, which is distributed c2k15Ex6.2 HR bo

14、oknreg hexp inc /* use all observations*/npredict res, r /* get the residuals*/ngen ressq=res2 /*square of res*/nreg ressq incnget the F value is 10.13 and p-value is 0.52%.nSo, we reject the null hypothesis of homoskedasticity at 1% significance.nUse LM test, nR=200.36=7.2nThe critical value 2(1)=3

15、.84, p-value is 0.73%, we get the same result.16Example: Housing price Equation (Wooldridge, p267)nEstimated modelprce =-21770.31+2.068lotsize + 122.778sqrft + 13852.52 bdrmspredict res, r. we get the residuals i of above eq.gen ressq=res2reg ressq on lotsize, sqrft, bdrmsressq=-5.52e9+201520.9lotsi

16、ze+1691037sqrft+1.04e9bdrmsF=5.34 p-value = 0.20%nR2=880.1601=14.1152 2(3)=7.8147 p-value = 0.28%So, we have a strong evidence to reject the null hypothesis of homoskedasticity.17Example: Housing price Equation (Wooldridge, p267), cont.nWe check whether there is heteroskedasticity in log form.nEstim

17、ated model islog(prce) =5.611+0.168log(lotsize) + 0.700log(sqrft) + 0.037 bdrmspredict resid, rgen residsq=resid2regress residsq on log(lotsize), log(sqrft), bdrmsresdsq=0.510 0.007 log(lotsize)-0.063 log(sqrft)+0.017 bdrmsF=1.41 p-value=24.51%nR2=88*0.048=4.224, p-value=23.83%nSo, we cant reject th

18、e null hypothesis and there is no heteroskedasticity.18The White TestnThe Breusch-Pagan test will detect any linear forms of heteroskedasticitynThe White test allows for nonlinearities by using squares and crossproducts of all the xs, ie, k=32= d0 d1 x1+ d2x2 +d3 x3 + d4 x12+d5x22 +d6x32+d7x1x2+d8x1

19、x3+d9x2x3+vnStill just using an F or LM to test whether all the xj, xj2, and xjxh are jointly significant,nThis can get to be unwieldy pretty quickly19Alternate form of the White testnConsider that the fitted values from OLS, , are a function of all the xsnThus, 2 will be a function of the squares a

20、nd crossproducts and and 2 can proxy for all of the xj, xj2, and xjxh, so nRegress the residuals squared on and 2 and use the R2 to form an F or LM statisticnNote only testing for 2 restrictions nownThe procedure of a special case of white test:regress y on x1,x2,xk. We get the residual iCalculate ,

21、 2 (predict ybar,xb. Gen ybarsq=ybar2)regress 2 on , 2 . And test the joint zero hypotheses of the regressorsUse F statistic or LM test to test the null hypothesis of homoskedasiticity.20Example: white test in the log housing price equationnlog(prce) =5.611+0.168log(lotsize) + 0.700log(sqrft) + 0.03

22、7 bdrmspredict resid, rpredict lpbargen residsq=resid2gen lpbarsq=lpbar2regress residsq on lpbar lpbarsqresdsq=23.778 3.714lpbar +0.145lpbarsqF=1.73 p-value=18.30%nR2=88*0.0392=3.4496, p-value=17.82%nWe still get the same result as BP test, and there is no heteroskedasticity21Corrections for Heteros

23、kedasticity22Corrections for HeteroskedasticityKnown variancesnVar(ui|x)=si2nThe original model isy =b0 + b1x1 + bkxk+ uTwo sides divided by si at the same timenThe new disturbance isui*=ui/si ,then var(ui*)=var(ui/si)=var(ui)/si2=1nSo the new modely/si =b0/si + b1x1/si + bkxk/si+ u/si, that is,y* =

24、b0* + b1x1* + bkxk*+ u*We can estimate the new model with OLS, this is called WLSBut, usually, we dont know the variances. 23Case of form being known up to a multiplicative constantnSuppose the heteroskedasticity can be modeled as Var(u|x) = s2h(x), where the trick is to figure out what h(x) hi look

25、s likenE(ui/hi|x) = 0, because hi is only a function of x, and Var(ui/hi|x) = s2, because we know Var(u|x) = s2hinSo, if we divided our whole equation by hi we would have a model where the error is homoskedastic 24Example: Simple Savings Function 012*201conside the simple savings functionvar|format

26、then,varvarvarSo, we divide original equation by to get1iiiiiiiiiiiiiiiiiiiiiisavincuuincincuuincuuincuincincincusavincincincincbbssbbnUsing data saving.raw, the OLS regression issvI = -124.84 + 0.147 incInThe WLS regression issv*I = -124.95wb + 0.172 inc*I (480.86) (0.057) n=100 R2=0.2259Where, wb

27、= 1/sqrt (inci). you can write it asnsvi= -124.95 + 0.172 inci25Generalized Least SquaresnEstimating the transformed equation by OLS is an example of generalized least squares (GLS) nGLS will be BLUE in this case,(because the transformed equation will meet the Gauss-Markov assumption)nGLS is a weigh

28、ted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(ui|xi)2*0011121011200111The sum of squared residuals in the transformed variables are1niiikikiniiikkiiiiiniiikikiiyxxxyxxhhhhyxxxhbbbbbbbbb26More on WLS,01,2,3,Lets consider the wage determination, where,

29、 i denote a particular firm and let e denote an employee with in the firm. Assume the above equation sati ei ei ei ei ewageeduagetenureubbbbisfies the Gauss-Markovassumptions, then we can estimate it, given a sample onindividuals across various firms. But, we only have the average values of wages, e

30、ducation, age, tenure by firm. That is, individual level data are not available. Thus, let , denote averagewages, average educations, average age, and average tenurefor the people at firm i, separately. Then the oriiiiwage educ age tenure0123iginal equationcan be transfromed to iiiiiwageeduagetenure

31、ubbbb27More on WLS, cont.2,If the original equation at the individual level satisfies the homoskedasticity assumption, then the firm-level equation the transformed equation must be heteroskedasticity.if var for all andi euis 2 , then var/, where is the number of employees in firm.1In this case, , th

32、e most efficient procedure is WLS, withweights equal to the number of employees at the firm 1/. Thisiiiiiiieummihmhms ensures that larger firms receive more weight. This givesus an efficient way of estimation the parameters in the individuallevel model when we only have averages at the firm level. A

33、 similar weighting arises when we are using per capita data atthe city, country, state, or country level. If the individual-level equation satisfies the Gauss-Markov assumptions, then the errorin per capita equation has a variance proportional to one over thesize of the population. Therefore, weight

34、ed least squares with weights equal to the population is appropriate.28Summary of WLSnWLS is great if we know what Var(ui|xi) looks likenIn most cases, wont know form of heteroskedasticitynExample where do is if data is aggregated, but model is individual levelnWant to weight each aggregate observat

35、ion by the inverse of the number of individuals29Feasible GLSnMore typical is the case where you dont know the form of the heteroskedasticitynIn this case, you need to estimate h(xi)nTypically, we start with the assumption of a fairly flexible model, such asVar(u|x) = s2exp(d0 + d1x1 + + dkxk) nSinc

36、e we dont know the d, must estimate30Feasible GLS (continued)nOur assumption implies that u2 = s2exp(d0 + d1x1 + + dkxk)vWhere E(v|x) = 1, then if E(v) = 1ln(u2) = a0 + d1x1 + + dkxk + eWhere E(e) = 1 and e is independent of xnNow, we know that is an estimate of u, so we can estimate this by OLS31Fe

37、asible GLS (continued)nNow, an estimate of h is obtained as = exp(), and the inverse of this is our weightnSo, what did we do? Run the original OLS model, save the residuals, , square them and take the logRegress ln(2) on all of the independent variables and get the fitted values, Do WLS using 1/exp

38、() as the weight32Example of FGLS: Demand for Cigarettes (Smoke.raw)nWhat determine the demand of people?nModelcgs = -3.64 + 0.88 log(income) 0.75 log(cigpric) 0.50 educ + 0.77 age 0.009 age2 2.83 restaurnnUse Breusch-Pagan test the heteroskedasticity:Get 2 and reg 2 on all independent variablesGet

39、F=5.55 p-value=0 Or, LM=8070.04=32.8 p-value =0.000014nreg ln(2) on all the independent variables and get the fitted value nTransforming all the data with 1/e, and regress the transformed equation without constant.cgs = 5.63 + 1.295 log(income) 2.94 log(cigpric) 0.463 educ + 0.482 age 0.0056 age2 3.

40、461 restaurnThe income effect is now statistically significant and larger in magnitude. The estimates changed somewhat, but the basic story is still the same. Cigarette smoking is negatively related to schooling, has a quadratic relationship with age, and is negatively affected by restaurant smoking

41、 restrictions.33Variance with Heteroskedasticity residuals OLS theare are where,is when for thisestimator A valid where,so , case, simple For the22222i22221211ixiiixxiiiiiuSSTuxxxxSSTSSTxxVarxxuxxsssbbb34Variance with Heteroskedasticity regression thisfrom residuals squared of sum theis and s,t variableindependenother allon regressingfrom residual theis where,isasticity heterosked with ofestimator valida model, regre

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

多元回归分析异方差问题

文档简介

温馨提示

最新文档

评论

多元回归分析异方差问题

文档简介

温馨提示

最新文档

评论

相关文档