多元回归分析课件_第1页
多元回归分析课件_第2页
多元回归分析课件_第3页
多元回归分析课件_第4页
多元回归分析课件_第5页
已阅读5页,还剩65页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、多元迴歸分析遺漏變數偏誤多元迴歸模型多元迴歸模型的估計多元迴歸模型: 實例變異數分析與參數檢定多元迴歸模型的幾個重要議題遺漏變數偏誤我們將不再假設解釋變數為固定值, 而是隨機變數在簡單迴歸模型中, 只有一個解釋變數, 然而, 在大多數的情形下, 被解釋變數Y 通常可被一個以上的變數所解釋。舉例來說, 所得水準除了受到教育程度的影響之外, 亦可能受到工作經驗等其他變數所影響遺漏變數偏誤此外, 只考慮一個解釋變數時, 可能會產生遺漏變數偏誤(omitted variable bias)考慮解釋變數(如教育程度) 與另外一個變數(如父母所得水準) 具相關性,(一般來說, 父母所得越高, 子女能夠得到

2、的教育越好, 教育程度自然越高)且該變數(父母所得水準) 本身亦會直接影響被解釋變數(所得水準), (一般來說, 父母所得越高, 投注在子女身上的其他資源越多, 子女的所得也因而越高)Suppose the true model isThe estimated model isThe covariance between Xi and error term is6Therefore,Since 0, we have7An example of omitted variable bias:Mozart Effect?Listening to Mozart for 10-15 minutes co

3、uld raise IQ by 8 or 9 points. (Nature 1993)Students who take optional music or arts courses in high school have higher English and math test scores than those who dont.9多元迴歸模型我們將只考慮一個解釋變數的簡單迴歸模型擴充為如下的多元迴歸模型:其中, X = X1, . . . , Xk 就是模型中的k 個解釋變數, ei 為隨機干擾項,且yx1 b0ResponsePlane(Observed y) eiPopulatio

4、n Multiple Regression ModelBivariate model:x2(x1i , x2i) 是未知參數, 其意義為亦即在控制其他變數影響之情況下, 第j 個解釋變數對於Y 的淨影響多元迴歸模型: 薪資所得, 教育程度與工作經驗多元迴歸模型為薪資所得= 0+ 1教育程度+ 2工作經驗+ei ,簡單迴歸模型為薪資所得= + 教育程度+ ei ,可以確定的是, 1 與 都是用來探討教育程度對於薪資所得的影響, 但是1 與 的詮釋卻不相同 單純地衡量教育程度如何影響薪資所得, 亦即, 教育程度增加一單位(譬如說增加一年), 薪資所得將增加 單位然而, 我們知道影響薪資所得的解釋變

5、數應該不只一個, 因此, 一旦我們將其他可能的解釋變數考慮進來(本例中的工作經驗), 則1 詮釋為:在給定相同的工作經驗下, 教育程度增加一單位, 薪資所得將增加1 單位多元迴歸模型的估計欲估計迴歸模型中的未知參數, 我們知道 相互獨立,最小平方法為多元迴歸模型的估計因此, 尋找 來極大透過我們可以得到k + 1 條標準方程式, 進而解出許多商業軟體如EXCEL 都能夠輕易地幫你找出這些估計值多元迴歸模型: 實例阿中為一物流送貨員, 時常在外奔波運送貨品。阿中的老板懷疑阿中利用在外送貨的空檔開小差, 因此,阿中的老板將他以前的送貨行程記錄調出根據多元迴歸模型:其中, Y =在外奔波時數, X1

6、 =送貨路程, 而X2 =送貨點個數阿中的老板估計出如下的迴歸模型在固定的送貨點個數下, 阿中的送貨路程每多一公里, 在外奔波時數增加0.066 小時;在相同的送貨路程下, 阿中的送貨點每多一個,在外奔波時數增加0.694 小時其中,在本例中,以及根據自由度為n (k + 1) = 10 (2 + 1) = 7的t 分配, 在顯著水準 =1%, 5% 以及10%的臨界值分別為3.499, 2.365 以及1.895因此, 在1% 的顯著水準下具顯著性, 而 則是在10% 的顯著水準下具顯著性送貨路程與送貨點個數無論是在經濟上或是統計上均具顯著性亦即, 都是在外奔波時數的重要解釋變數在得到以上的

7、估計後, 阿中的老板一旦知道阿中今天有5 個送貨點得跑, 總路程為110 公里, 則阿中的老板可以預測阿中今天在外奔波時數為0.39 + 0.066 110 + 0.694 5 = 10.35 小時如果阿中今天在外奔波了12 個小時, 則阿中的老板就能夠合理地懷疑阿中利用2 小時開小差這個例子清楚地說明迴歸模型的兩大重要功能:解釋與預測23.2 Interpreting Multiple RegressionExample: Womens Apparel StoresResponse variable: sales at stores in a chain of womens apparel

8、(annually in dollars per square foot of retail space).Two explanatory variables: median household income in the area (thousands of dollars) and number of competing apparel stores in the same mall.Copyright 2011 Pearson Education, Inc.7 of 4723.2 Interpreting Multiple RegressionExample: Womens Appare

9、l StoresBegin with a scatterplot matrix, a table of scatterplots arranged as in a correlation matrix.Using a scatterplot matrix to understand data can save considerable time later when interpreting the multiple regression results.Copyright 2011 Pearson Education, Inc.8 of 4723.2 Interpreting Multipl

10、e RegressionExample: Womens Apparel StoresThe scatterplot matrix for this example Confirms a positive linear association between sales and median household income.Shows a weak association between sales and number of competitors.Copyright 2011 Pearson Education, Inc.10 of 4723.2 Interpreting Multiple

11、 RegressionCorrelation Matrix: Womens Apparel StoresCopyright 2011 Pearson Education, Inc.11 of 4723.2 Interpreting Multiple RegressionPartial Slopes: Womens Apparel Stores Copyright 2011 Pearson Education, Inc.16 of 4723.2 Interpreting Multiple RegressionMarginal and Partial SlopesPartial slope: sl

12、ope of an explanatory variable in a multiple regression that statistically excludes the effects of other explanatory variables.Marginal slope: slope of an explanatory variable in a simple regression. Copyright 2011 Pearson Education, Inc.15 of 4723.2 Interpreting Multiple RegressionPartial Slopes: W

13、omens Apparel Stores Copyright 2011 Pearson Education, Inc.16 of 47Inference in Multiple RegressionInference for One CoefficientThe t-statistic is used to test each slope using the null hypothesis H0: j = 0.The t-statistic is calculated as Copyright 2011 Pearson Education, Inc.31 of 47Inference in M

14、ultiple Regressiont-test Results for Womens Apparel StoresThe t-statistics and associated p-values indicate that both slopes are significantly different from zero.Copyright 2011 Pearson Education, Inc.32 of 47Prediction IntervalsAn approximate 95% prediction interval is given by .For example, the 95

15、% prediction interval for sales per square foot at a location with median income of $70,000 and 3 competitors is approximately$545.47 $136.06 per square foot. Copyright 2011 Pearson Education, Inc.33 of 47Partial Slopes: Womens Apparel StoresThe slope b1 = 7.966 for Income implies that a store in a

16、location with a higher median household of $10,000 sells, on average, $79.66 more per square foot than a store in a less affluent location with the same number of competitors.The slope b2 = -24.165 implies that, among stores in equally affluent locations, each additional competitor lowers average sa

17、les by $24.165 per square foot. Copyright 2011 Pearson Education, Inc.17 of 47Marginal and Partial SlopesPartial and marginal slopes only agree when the explanatory variables are uncorrelated.In this example they do not agree. For instance, the marginal slope for Competitors is 4.6352. It is positiv

18、e because more affluent locations tend to draw more competitors. The MRM separates these effects but the SRM does not.Copyright 2011 Pearson Education, Inc.18 of 47Checking ConditionsConditions for InferenceUse the residuals from the fitted MRM to check that the errors in the model are independent;h

19、ave equal variance; andfollow a normal distribution.Copyright 2011 Pearson Education, Inc.21 of 47Checking ConditionsCalibration PlotCalibration plot: scatterplot of the response on the fitted values .R2 is the correlation between and ; the tighter data cluster along the diagonal line in the calibra

20、tion plot, the larger the R2 value. Copyright 2011 Pearson Education, Inc.22 of 4723.3 Checking ConditionsCalibration Plot: Womens Apparel Stores Copyright 2011 Pearson Education, Inc.23 of 4723.3 Checking ConditionsResidual PlotsPlot of residuals versus fitted y values is used to identify outliers

21、and to check for the similar variances condition.Plot of residuals versus each explanatory variable are used to verify that the relationships are linear. Copyright 2011 Pearson Education, Inc.24 of 4723.3 Checking ConditionsResidual Plot: Womens Apparel StoresThis plot of residuals versus fitted val

22、ues of y has no evident pattern. Copyright 2011 Pearson Education, Inc.25 of 4723.3 Checking ConditionsResidual Plot: Womens Apparel StoresThis plot of residuals versus Income has no evident pattern. Copyright 2011 Pearson Education, Inc.26 of 47Checking ConditionsCheck Normality: Womens Apparel Sto

23、resThe quantile plot indicates nearly normal condition is satisfied.Copyright 2011 Pearson Education, Inc.27 of 47變異數分析與參數檢定我們可以輕易地將簡單迴歸模型中的變異數分析表擴展為多元迴歸架構下的變異數分析表。其中, UV的自由度變成n k 1 係因估計參數 而損失了(k + 1) 個自由度。F 檢定一如前一章的討論, 對於的虛無假設, 我們可以採用F 檢定:在顯著水準為 下, 當我們拒絕虛無假設我們可以算出判定係數R2 為亦即被解釋變數Y 的總變異中, 有多少比例可被迴歸模型

24、所解釋然而, 每增加一個解釋變數進入多元迴歸模型,UV 亦會隨之減少(或著不變), 進而使得R2 增加(或著不變)為什麼每增加一個解釋變數, UV 就會隨之減少?假設你本來考慮兩個解釋變數, 如今欲增加一個解釋變數, 因此, 極小化問題變成如果找到的 恰好為零, 則此時的UV 就會等於只考慮兩個解釋變數時的 :若找到的 不為零, 代表亦即, 多增加一個解釋變數會使UV 降低, 進而造成R2 增加因此, 以R2 來判斷多元迴歸模型會有一個糟糕的問題: 考慮的解釋變數越多, 模型的解釋能力越好如此一來, 我們若是在原來的模型中無止境的增加解釋變數, 或是放入一些不相干的變數, 模型的解釋力不會降低

25、(亦即增加或是不變), 但是這樣做毫無意義修正的判定係數(adjusted coefficient ofdetermination)為了彌補判定係數的這個缺陷, 我們採用修正的判定係數:在 中, 我們對於增加解釋變數予以懲罰, 當解釋變數增加, 雖然R2 會增加或不變, 但是懲罰項增加, 進而拉低 。因此, 利用修正的判定係數來衡量模型的配適度, 並不會得到解釋變數多多益善的結論Example: Womens Apparel StoresResponse variable: sales at stores in a chain of womens apparel (annually in do

26、llars per square foot of retail space).Two explanatory variables: median household income in the area (thousands of dollars) and number of competing apparel stores in the same mall.Copyright 2011 Pearson Education, Inc.7 of 47R-squared and seThe equation of the fitted model for estimating sales in t

27、he womens apparel stores example is= 60.3587 + 7.966 Income -24.165 Competitors Copyright 2011 Pearson Education, Inc.12 of 47R-squared and seR2 indicates that the fitted equation explains 59.47% of the store-to-store variation in sales.For this example, R2 is larger than the r2 values for separate

28、SRMs fitted for each explanatory variable; it is also larger than their sum.For this example, se = $68.03. Copyright 2011 Pearson Education, Inc.13 of 47R-squared and se is known as the adjusted R-squared. It adjusts for both sample size n and model size k. It is always smaller than R2. The residual

29、 degrees of freedom (n-k-1) is the divisor of se. and se move in opposite directions when an explanatory variable is added to the model ( goes up while se goes down). Copyright 2011 Pearson Education, Inc.14 of 47Inference for the Model: F-testF-test: test of the explanatory power of the MRM as a wh

30、ole.F-statistic: ratio of the sample variance of the fitted values to the variance of the residuals. Copyright 2011 Pearson Education, Inc.28 of 4723.4 Inference in Multiple RegressionInference for the Model: F-testThe F-Statisticis used to test the null hypothesis that all slopes are equal to zero,

31、 e.g., H0: . Copyright 2011 Pearson Education, Inc.29 of 47F-test Results in Analysis of Variance TableThe F-statistic has a p-value of 0.0001; reject H0. Income and Competitors together explain statistically significant variation in sales.Copyright 2011 Pearson Education, Inc.30 of 47Steps in Fitti

32、ng a Multiple RegressionWhat is the problem to be solved? Do these data help in solving it?Check the scatterplots of the response versus each explanatory variable (scatterplot matrix).If the scatterplots appear straight enough, fit the multiple regression model. Otherwise find a transformation.Obtai

33、n the residuals and fitted values from the regression.Copyright 2011 Pearson Education, Inc.34 of 47Steps in Fitting a Multiple Regression Use residual plot of e vs. to check for similar variance condition.Construct residual plots of e vs. explanatory variables. Look for patterns.Check whether the r

34、esiduals are nearly normal.Use the F-statistic to test the null hypothesis that the collection of explanatory variables has no effect on the response.If the F-statistic is statistically significant, test and interpret individual partial slopes.Copyright 2011 Pearson Education, Inc.35 of 474M Example

35、 23.1: SUBPRIME MORTGAGES Motivation A banking regulator would like to verify how lenders use credit scores to determine the interest rate paid by subprime borrowers. The regulator would like to separate its effect from other variables such as loan-to-value (LTV) ratio, income of the borrower and va

36、lue of the home.Copyright 2011 Pearson Education, Inc.36 of 474M Example 23.1: SUBPRIME MORTGAGES Method Use multiple regression on data obtained for 372 mortgages from a credit bureau. The explanatory variables are the LTV, credit score, income of the borrower, and home value. The response is the annual percentage rate of interest on the loan (APR

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论