回归分析大作业_第1页
回归分析大作业_第2页
回归分析大作业_第3页
回归分析大作业_第4页
回归分析大作业_第5页
已阅读5页,还剩19页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、回归大作业国内旅游消费影响的回归分析一、问题引入我国第三产业发展迅速,在2010年其已占国内生产总值的43.14%,而旅游业在第三产业中占有重要地位,且与餐饮、住宿、休闲、运输等产业联系密切,所以此次分析以探究国内旅游消费的影响为目的,并建立回归模型。二、模型设计运用多元线性模型拟合,若拟合效果不显著,则进行log或平方根变换或使用多项式拟合等其他模型。1、相关性分析,首先确定与因变量有相关性的变量。2、建立全模型多元线性回归,若回归方程F检验未通过,则查找原因、更换模型;若有部分回归系数检验未通过,则进行选元(步骤2),剔除部分变量再继续;若所有检验都良好,则模型初步确立,跳过步骤2。3、运

2、用逐步回归方法筛选变量,并进行t检验,若效果显著,则可初步确立多元线性回归模型;若仍有部分变量未通过检验,则再单独进行变量筛选,综合运用AIC准则等确定剔除变量,直至所有变量都通过t检验。4、回归诊断。进行残差分析,检验残差是否满足正态分布,是否有相关性,也即自变量间是否有自相关性,检验是否存在异常值和强影响值,是否存在异方差性,是否存在多重共线性。若以上问题存在,则需修改模型,或重新筛选变量,或增减样本。5、模型最终确立。三、数据yearincomenumberexpenselevelroadrail199448108.5524195.3320.0111.785.90199559810.56

3、29218.7345.1115.706.24199670142.5640256.2377.6118.586.49199778060.9644328.1394.6122.646.60199883024.3695345.0417.8127.856.64199988479.2719394.0452.3135.176.74200098000.5744426.6491.0140.276.872001108068.2784449.5521.2169.807.012002119095.7878441.8557.6176.527.192003135174.0870395.7596.9180.987.30200

4、4159586.81102427.5645.3187.077.442005183618.51212436.1695.2334.527.542006215883.91394446.9761.9345.707.712007266411.01610482.6843.4358.377.802008315274.71712511.0916.8373.027.972009341401.51902535.41001.6386.088.552010403260.02103598.21062.6400.829.12yearairrailtranroadtranshiptranairtrantravel19941

5、04.561087389539402616540391023.51995112.9010274510408102392451171375.71996116.659479711221102289555551638.41997142.509330812045832257356302112.71998150.589508512573322054557552391.21999152.2210016412690041915160942831.92000150.2910507313473921938667223175.52001155.3610515514027981864575243522.420021

6、63.7710560614752571869385943878.42003174.959726014643351714287593442.32004204.94111764162452619040121234710.72005199.85115583169738120227138275285.92006211.35125656186048722047159686229.72007234.30135670205068022835185767770.62008246.18146193268211420334192518749.32009234.511524512779081223142305210

7、183.72010276.511676093052738223922676912579.8数据来源:中国统计年鉴2011数据说明:Year:年份。Income:国民总收入,单位亿元。Number:旅游人数。Expense:人均旅游花费,单位元。Level:居民消费水平指数,以1978年为基年。Road:公路里程,单位万公里。Rail:铁路里程,单位万公里。Air:民航里程,单位万公里。Roadtran:公路客运量,单位万人。Railtran:铁路客运量,单位万人。Shiptran:水路客运量,单位万人。Airtran:民航客运量,单位万人。Travel:国内旅游消费总额,单位亿元。四、回归分析

8、1、相关性首先分析相关性,画出散布阵。 可较为直观地看出,travel与各变量间有较强的相关性,除了road,和shiptran两项,做相关性检验,可见,travel与road是线性相关的,相关系数为0.93,p-value = 4.563e-08,而travel与shiptran不相关,p-value = 0.9983,所以可先排除shiptran,再做回归。2、全回归模型直接建立多元回归模型,得结果:Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -5.972e+03 3.193e+03 -1.870 0.1106

9、17 income 2.151e-02 4.779e-03 4.501 0.004100 * number 1.039e+00 1.446e+00 0.719 0.499354 expense 6.805e+00 1.124e+00 6.052 0.000922 *level -5.815e+00 1.261e+00 -4.610 0.003653 * road -1.468e+00 1.019e+00 -1.441 0.199608 rail 6.274e+02 4.462e+02 1.406 0.209292 air -4.155e+00 2.790e+00 -1.490 0.186935

10、 railtran 2.524e-02 8.492e-03 2.972 0.024903 * roadtran -4.093e-04 4.554e-04 -0.899 0.403410 airtran 1.058e-01 1.272e-01 0.832 0.437327 -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 84.55 on 6 degrees of freedomMultiple R-squared: 0.9998, Adjusted R-squared: 0.9994 F-stati

11、stic: 2462 on 10 and 6 DF, p-value: 5.061e-10其中,R2=0.9998, F检验的p-value: 2.632e-08,可见回归模型的检验是成立的,但回归系数并不是全能通过检验,所以应该进行选元。3、选元先进行逐步回归,逐步回归排除了roadtran,number两个变量,以AIC准则为主要判断依据,调整后的AIC值为153.73,达到最小值。再检验一下回归模型:Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -4.393e+03 2.102e+03 -2.090 0.070

12、022 . income 1.898e-02 2.320e-03 8.179 3.72e-05 *expense 7.038e+00 9.369e-01 7.512 6.85e-05 *level -5.427e+00 1.057e+00 -5.133 0.000893 *road -1.460e+00 9.339e-01 -1.564 0.156518 rail 3.697e+02 2.865e+02 1.290 0.232935 air -3.589e+00 2.496e+00 -1.438 0.188431 railtran 2.166e-02 6.843e-03 3.165 0.013

13、295 * airtran 2.032e-01 5.464e-02 3.719 0.005879 * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 78.95 on 8 degrees of freedomMultiple R-squared: 0.9997, Adjusted R-squared: 0.9994 F-statistic: 3529 on 8 and 8 DF, p-value: 2.252e-13 可见回归模型改善,自由度调整负相关系数达到了0.9994,有所提高,这与AIC准

14、则的判断相符,而回归系数的检验也有所好转,但仍然有road,rail,air通不过检验。若去掉一个变量回归,可见: Df Sum of Sq RSS AIC 49866 153.73income 1 416943 466809 189.75expense 1 351763 401629 187.19level 1 164237 214103 176.50road 1 15241 65107 156.26rail 1 10380 60246 154.94air 1 12886 62752 155.63railtran 1 62438 112303 165.53airtran 1 86215 13

15、6081 168.79去掉rail,AIC增加最小,同时RSS增加最小,而回归方程系数检验:Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -1.773e+03 5.648e+02 -3.140 0.011936 * income 1.935e-02 2.386e-03 8.112 1.98e-05 *expense 7.977e+00 6.116e-01 13.043 3.77e-07 *level -5.126e+00 1.069e+00 -4.797 0.000978 *road -2.214e+00 7.550

16、e-01 -2.933 0.016676 * air -5.129e+00 2.272e+00 -2.257 0.050398 . railtran 1.495e-02 4.613e-03 3.241 0.010144 * airtran 2.603e-01 3.323e-02 7.832 2.62e-05 * 只有air一项在a=0.05的情况下是不能通过检验的,若排除air,则:Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -2.450e+03 5.683e+02 -4.310 0.00154 * income

17、1.834e-02 2.782e-03 6.593 6.13e-05 *expense 7.465e+00 6.742e-01 11.072 6.21e-07 *level -5.389e+00 1.261e+00 -4.273 0.00163 * road -2.381e+00 8.921e-01 -2.669 0.02355 * railtran 1.933e-02 4.970e-03 3.889 0.00301 * airtran 2.451e-01 3.864e-02 6.343 8.42e-05 *所有回归系数通过检验,回归模型初步确立。4、回归诊断计算得出残差,进行W正态性检验,得

18、到p-value = 0.9066,不能拒绝正态性假设。而回归值与标准化残差的残差图为:从图中也可看出,残差分布均匀且无规律,所以线性回归的基本假设满足,且没有自相关性。而再看:综合看上面四幅图,11和15号观测值可能为强影响值,但产生原因还需要探究,可能是统计过程上的,亦可能是分析方法上的,去掉后回归效果减弱,所以暂不剔除。再检验多重共线性,kappa=1346.4111000,所以存在多重共线性,接近零的特征值及其相应特征向量为:0.004087919,,61, 0.745121692, 0.070209783, -0.602338494, 0.133464995, -0.14256057

19、6, -0.19727183 0.005567391,51, -0.2644789842, 0.1157752603, -0.5505641604, 0.0045676345, -0.0738791746, 0.779773728可见,1,3,6之间即income与level,airtran之间可能存在严重的多重共线性关系,更可能的是在income与level之间,这在经济意义上也可以理解,国民收入越高,消费水平越高,而坐飞机的人才越多,前两者关系更直接。所以引起原因可能是有多余的自变量,分别去掉income,level,airtran做回归,并计算kappa值。从结果知,不管去掉哪一个,ka

20、ppa值均减少一半左右,而只有去掉level时,回归方程几乎无影响,Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -3.824e+03 7.511e+02 -5.091 0.000349 *income 1.217e-02 3.811e-03 3.194 0.008552 * expense 5.483e+00 7.843e-01 6.991 2.3e-05 *road -4.247e+00 1.247e+00 -3.407 0.005855 * railtran 2.708e-02 7.416e-03 3.651

21、0.003811 * airtran 1.929e-01 5.876e-02 3.284 0.007288 * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 155.7 on 11 degrees of freedomMultiple R-squared: 0.9985, Adjusted R-squared: 0.9978 F-statistic: 1450 on 5 and 11 DF, p-value: 4.078e-15 所以可以剔除level。再做一下异方差性的检验,用等级相关系数法,

22、计算残差的绝对值与自变量间的等级相关系数,分别为0.2156863,0.05637255,0.2156863,0,0.2156863发现并无相关的,所以模型拟合良好。5、模型确立Travel=-3.824e+03+1.217e-02*income+5.483*expense-4.247*road+2.708e-02*railtran+1.929e-01*airtran五、模型评注从模型来看,国内旅游消费量可由国民收入、人均旅游花费、铁路客运量、民航客运量、公路里程来建模模拟预测,这与实际意义相符。前两者可归纳为人民生活水平,后三者是国家交通建设方面,而恰恰包括了公路、铁路、航空三个方面。所以回

23、归方程的建立与其实际意义大致相符,影响因素也基本确定。但是受开始自变量选择的影响,有可能存在重要变量为选入。六、程序代码及输出(编程语言:R) x=read.csv(数据.csv,head=T) a=x,2:13 plot(a) cor.test(road,travel) /*相关性检验*/ Pearsons product-moment correlationdata: road and travel t = 10.0692, df = 15, p-value = 4.563e-08alternative hypothesis: true correlation is not equal t

24、o 0 95 percent confidence interval: 0.8209980 0.9761007 sample estimates: cor 0.9333393 cor.test(shiptran,travel) Pearsons product-moment correlationdata: shiptran and travel t = 0.0021, df = 15, p-value = 0.9983alternative hypothesis: true correlation is not equal to 0 95 percent confidence interva

25、l: -0.4802217 0.4810676 sample estimates: cor 0.0005500457model=lm(travelincome+number+expense+level+road+rail+air+railtran+roadtran+airtran) summary(model) /*建立回归模型*/Call:lm(formula = travel income + number + expense + level + road + rail + air + railtran + roadtran + airtran)Residuals: Min 1Q Medi

26、an 3Q Max -72.549 -44.860 3.562 44.806 90.603 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -5.972e+03 3.193e+03 -1.870 0.110617 income 2.151e-02 4.779e-03 4.501 0.004100 * number 1.039e+00 1.446e+00 0.719 0.499354 expense 6.805e+00 1.124e+00 6.052 0.000922 *level -5.815e+00 1.261e+0

27、0 -4.610 0.003653 * road -1.468e+00 1.019e+00 -1.441 0.199608 rail 6.274e+02 4.462e+02 1.406 0.209292 air -4.155e+00 2.790e+00 -1.490 0.186935 railtran 2.524e-02 8.492e-03 2.972 0.024903 * roadtran -4.093e-04 4.554e-04 -0.899 0.403410 airtran 1.058e-01 1.272e-01 0.832 0.437327 -Signif. codes: 0 * 0.

28、001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 84.55 on 6 degrees of freedomMultiple R-squared: 0.9998, Adjusted R-squared: 0.9994 F-statistic: 2462 on 10 and 6 DF, p-value: 5.061e-10 model1=step(model) /*逐步回归*/Start: AIC=155.17travel income + number + expense + level + road + rail + air + railt

29、ran + roadtran + airtran Df Sum of Sq RSS AIC- number 1 3693 46589 154.57- airtran 1 4948 47844 155.02 42897 155.17- roadtran 1 5775 48671 155.31- rail 1 14137 57033 158.01- road 1 14850 57746 158.22- air 1 15862 58758 158.52- railtran 1 63136 106033 168.55- income 1 144834 187731 178.26- level 1 15

30、1949 194845 178.90- expense 1 261858 304755 186.50Step: AIC=154.57travel income + expense + level + road + rail + air + railtran + roadtran + airtran Df Sum of Sq RSS AIC- roadtran 1 3276 49866 153.73 46589 154.57- rail 1 11735 58325 156.39- air 1 15657 62246 157.50- road 1 17009 63598 157.86- airtr

31、an 1 58169 104758 166.34- railtran 1 64855 111444 167.40- income 1 148468 195057 176.91- level 1 163524 210114 178.18- expense 1 353482 400071 189.12Step: AIC=153.73travel income + expense + level + road + rail + air + railtran + airtran Df Sum of Sq RSS AIC 49866 153.73- rail 1 10380 60246 154.94-

32、air 1 12886 62752 155.63- road 1 15241 65107 156.26- railtran 1 62438 112303 165.53- airtran 1 86215 136081 168.79- level 1 164237 214103 176.50- expense 1 351763 401629 187.19- income 1 416943 466809 189.75 summary(model1)Call:lm(formula = travel income + expense + level + road + rail + air + railt

33、ran + airtran)Residuals: Min 1Q Median 3Q Max -66.673 -57.766 2.796 46.749 91.039 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -4.393e+03 2.102e+03 -2.090 0.070022 . income 1.898e-02 2.320e-03 8.179 3.72e-05 *expense 7.038e+00 9.369e-01 7.512 6.85e-05 *level -5.427e+00 1.057e+00 -5.

34、133 0.000893 *road -1.460e+00 9.339e-01 -1.564 0.156518 rail 3.697e+02 2.865e+02 1.290 0.232935 air -3.589e+00 2.496e+00 -1.438 0.188431 railtran 2.166e-02 6.843e-03 3.165 0.013295 * airtran 2.032e-01 5.464e-02 3.719 0.005879 * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error:

35、 78.95 on 8 degrees of freedomMultiple R-squared: 0.9997, Adjusted R-squared: 0.9994 F-statistic: 3529 on 8 and 8 DF, p-value: 2.252e-13 model2=drop1(model1) /*减少一个变量做回归*/ model2Single term deletionsModel:travel income + expense + level + road + rail + air + railtran + airtran Df Sum of Sq RSS AIC 4

36、9866 153.73income 1 416943 466809 189.75expense 1 351763 401629 187.19level 1 164237 214103 176.50road 1 15241 65107 156.26rail 1 10380 60246 154.94air 1 12886 62752 155.63railtran 1 62438 112303 165.53airtran 1 86215 136081 168.79 model3=update(model1,.-rail) /*剔除rail*/ summary(model3)Call:lm(formu

37、la = travel income + expense + level + road + air + railtran + airtran)Residuals: Min 1Q Median 3Q Max -77.120 -62.739 -7.682 57.073 96.157 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -1.773e+03 5.648e+02 -3.140 0.011936 * income 1.935e-02 2.386e-03 8.112 1.98e-05 *expense 7.977e+0

38、0 6.116e-01 13.043 3.77e-07 *level -5.126e+00 1.069e+00 -4.797 0.000978 *road -2.214e+00 7.550e-01 -2.933 0.016676 * air -5.129e+00 2.272e+00 -2.257 0.050398 . railtran 1.495e-02 4.613e-03 3.241 0.010144 * airtran 2.603e-01 3.323e-02 7.832 2.62e-05 *-Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Re

39、sidual standard error: 81.82 on 9 degrees of freedomMultiple R-squared: 0.9997, Adjusted R-squared: 0.9994 F-statistic: 3756 on 7 and 9 DF, p-value: 7.348e-15 model4=update(model3,.-air) summary(model4)Call:lm(formula = travel income + expense + level + road + railtran + airtran)Residuals: Min 1Q Me

40、dian 3Q Max -165.78 -44.43 12.86 49.24 123.92 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -2.450e+03 5.683e+02 -4.310 0.00154 * income 1.834e-02 2.782e-03 6.593 6.13e-05 *expense 7.465e+00 6.742e-01 11.072 6.21e-07 *level -5.389e+00 1.261e+00 -4.273 0.00163 * road -2.381e+00 8.921e

41、-01 -2.669 0.02355 * railtran 1.933e-02 4.970e-03 3.889 0.00301 * airtran 2.451e-01 3.864e-02 6.343 8.42e-05 *-Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 97.14 on 10 degrees of freedomMultiple R-squared: 0.9995, Adjusted R-squared: 0.9991 F-statistic: 3108 on 6 and 10 DF

42、, p-value: 9.282e-16 resid=resid(model4) resid 1 2 3 4 5 6 32.124983 -8.719782 12.857759 -83.089036 50.686957 47.681664 7 8 9 10 11 12 -54.769913 -28.532399 123.920008 80.382480 -165.782946 33.008293 13 14 15 16 17 -28.870183 -44.425769 -112.199289 96.483549 49.243624 shapiro.test(resid) /*W正态性检验*/

43、Shapiro-Wilk normality testdata: resid W = 0.9756, p-value = 0.9066 y=predict(model4) rstandard=rstandard(model4) plot(y,rstandard) plot(model4,1) plot(model4,2) plot(model4,3) plot(model4,4) attach(x) aa=data.frame(travel,income,expense,level,road,railtran,airtran) b=aa,2:7 bb=cor(b) kappa(bb,exact

44、=T) /*计算kappa值*/1 1366.411 eigen(bb) /*求解矩阵特征值及特征向量*/$values1 5.585778007 0.284630469 0.091524258 0.028411957 0.0055673916 0.004087919$vectors ,1 ,2 ,3 ,4 ,51, -0.4203320 -0.11207006 0.1593664 0.4002601 -0.2644789842, -0.3747155 0.86124014 0.1616264 -0.2709193 0.1157752603, -0.4210500 0.07658249 -0.

45、0981632 0.3758770 -0.5505641604, -0.4080864 -0.16947687 -0.7878004 -0.4077742 0.0045676345, -0.4033123 -0.43364039 0.5637918 -0.5528584 -0.0738791746, -0.4200367 -0.15190277 0.0187778 0.3913807 0.779773728 ,61, 0.745121692, 0.070209783, -0.602338494, 0.133464995, -0.142560576, -0.19727183 bbb=kappa(

46、cor(b,colnames(b)!=level) /*去掉变量level后求kappa值*/ bbb1 529.9542 kappa(cor(b,colnames(b)!=income)1 537.9962 kappa(cor(b,colnames(b)!=airtran)1 624.6458 summary(update(model4,.-level)Call:lm(formula = travel income + expense + road + railtran + airtran)Residuals: Min 1Q Median 3Q Max -322.63 -58.04 -11.

47、62 95.45 214.69 Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) -3.824e+03 7.511e+02 -5.091 0.000349 *income 1.217e-02 3.811e-03 3.194 0.008552 * expense 5.483e+00 7.843e-01 6.991 2.3e-05 *road -4.247e+00 1.247e+00 -3.407 0.005855 * railtran 2.708e-02 7.416e-03 3.651 0.003811 * airtran 1.929e-01 5.876e-02 3.284 0.007288 * -Signif. codes: 0 * 0.001 * 0.01 * 0.05 . 0.1 1 Residual standard error: 155.7 on 11 degrees of freedomMultiple R-squared: 0

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论