版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
回归大作业国内旅游消费影响的回归分析一、问题引入我国第三产业发展迅速,在2010年其已占国内生产总值的43.14%,而旅游业在第三产业中占有重要地位,且与餐饮、住宿、休闲、运输等产业联系密切,所以此次分析以探究国内旅游消费的影响为目的,并建立回归模型。二、模型设计运用多元线性模型拟合,若拟合效果不显著,则进行log或平方根变换或使用多项式拟合等其他模型。1、相关性分析,首先确定与因变量有相关性的变量。2、建立全模型多元线性回归,若回归方程F检验未通过,则查找原因、更换模型;若有部分回归系数检验未通过,则进行选元(步骤2),剔除部分变量再继续;若所有检验都良好,则模型初步确立,跳过步骤2。3、运用逐步回归方法筛选变量,并进行t检验,若效果显著,则可初步确立多元线性回归模型;若仍有部分变量未通过检验,则再单独进行变量筛选,综合运用AIC准则等确定剔除变量,直至所有变量都通过t检验。4、回归诊断。进行残差分析,检验残差是否满足正态分布,是否有相关性,也即自变量间是否有自相关性,检验是否存在异常值和强影响值,是否存在异方差性,是否存在多重共线性。若以上问题存在,则需修改模型,或重新筛选变量,或增减样本。5、模型最终确立。三、数据yearincomenumberexpenselevelroadrail199448108.5524195.3320.0111.785.90199559810.5629218.7345.1115.706.24199670142.5640256.2377.6118.586.49199778060.9644328.1394.6122.646.60199883024.3695345.0417.8127.856.64199988479.2719394.0452.3135.176.74200098000.5744426.6491.0140.276.872001108068.2784449.5521.2169.807.012002119095.7878441.8557.6176.527.192003135174.0870395.7596.9180.987.302004159586.81102427.5645.3187.077.442005183618.51212436.1695.2334.527.542006215883.91394446.9761.9345.707.712007266411.01610482.6843.4358.377.802008315274.71712511.0916.8373.027.972009341401.51902535.41001.6386.088.552010403260.02103598.21062.6400.829.12yearairrailtranroadtranshiptranairtrantravel1994104.561087389539402616540391023.51995112.9010274510408102392451171375.71996116.659479711221102289555551638.41997142.509330812045832257356302112.71998150.589508512573322054557552391.21999152.2210016412690041915160942831.92000150.2910507313473921938667223175.52001155.3610515514027981864575243522.42002163.7710560614752571869385943878.42003174.959726014643351714287593442.32004204.94111764162452619040121234710.72005199.85115583169738120227138275285.92006211.35125656186048722047159686229.72007234.30135670205068022835185767770.62008246.18146193268211420334192518749.32009234.511524512779081223142305210183.72010276.511676093052738223922676912579.8数据来源:《中国统计年鉴2011》数据说明:Year:年份。Income:国民总收入,单位亿元。Number:旅游人数。Expense:人均旅游花费,单位元。Level:居民消费水平指数,以1978年为基年。Road:公路里程,单位万公里。Rail:铁路里程,单位万公里。Air:民航里程,单位万公里。Roadtran:公路客运量,单位万人。Railtran:铁路客运量,单位万人。Shiptran:水路客运量,单位万人。Airtran:民航客运量,单位万人。Travel:国内旅游消费总额,单位亿元。四、回归分析1、相关性首先分析相关性,画出散布阵。2000110000O40O500000^00^色00-000000^000^10.000E0001001000000□□BoH00aQo0GhraBsaHHHasHBa叽0TH00HH0HEa5rooo5C可较为直观地看出,travel与各变量间有较强的相关性,除了road,和shiptran两项,做相关性检验,可见,travel与road是线性相关的,相关系数为0.93,p-value=4.563e-08,而travel与shiptran不相关,p-value=0.9983,所以可先排除shiptran,再做回归。2、全回归模型直接建立多元回归模型,得结果:Coefficients:EstimateStd.ErrortvaluePr(〉|t|)(Intercept)-5.972e+033.193e+03-1.8700.110617income2.151e-024.779e-034.5010.004100**number1.039e+001.446e+000.7190.499354expense6.805e+001.124e+006.0520.000922**:level-5.815e+001.261e+00-4.6100.003653**roadT.468e+001.019e+00-1.4410.199608rail6.274e+024.462e+021.4060.209292air-4.155e+002.790e+00-1.4900.186935railtran2.524e-028.492e-032.9720.024903*roadtran-4.093e-044.554e-04-0.8990.403410airtran1.058e—011.272e—010.8320.43732711110.1Signif.codes:0‘***'0.001‘**'0.01‘*'0.050.1Residualstandarderror:84.55on6degreesoffreedomMultipleR-squared:0.9998,AdjustedR-squared:0.9994F-statistic:2462on10and6DF,p-value:5.061e-10其中,R2=0.9998,F检验的p-value:2.632e-08,可见回归模型的检验是成立的,但回归系数并不是全能通过检验,所以应该进行选元。3、选元先进行逐步回归,逐步回归排除了roadtran,number两个变量,以AIC准则为主要判断依据,调整后的AIC值为153.73,达到最小值。再检验一下回归模型:Coefficients:(Intercept)EstimateStd.ErrortvaluePr(>|t|)-4.393e+032.102e+03-2.0900.070022.income1.898e-022.320e-038.1793.72e-05***expense7.038e+009.369e-017.5126.85e-05***level-5.427e+001.057e+00-5.1330.000893***road-1.460e+009.339e-01-1.5640.156518rail3.697e+022.865e+021.2900.232935air-3.589e+002.496e+00-1.4380.188431railtran2.166e-026.843e-033.1650.013295*airtran2.032e-015.464e-023.7190.005879**Signif.codes:0‘***'0.001‘**'0.01‘*'0.05‘.'0.1‘'1Residualstandarderror:78.95on8degreesoffreedomMultipleR-squared:0.9997,AdjustedR-squared:0.9994F-statistic:3529on8and8DF,p-value:2.252e-13可见回归模型改善,自由度调整负相关系数达到了0.9994,有所提高,这与AIC准则的判断相符,而回归系数的检验也有所好转,但仍然有road,rail,air通不过检验。若去掉一个变量回归,可见:DfSumofSqRSSAIC<none>49866153.73income1416943466809189.75expense1351763401629187.19level1164237214103176.50road11524165107156.26rail11038060246154.94air11288662752155.63railtran162438112303165.53airtran186215136081168.79去掉rail,AIC增加最小,同时RSS增加最小,而回归方程系数检验:Coefficients:(Intercept)EstimateStd.ErrortvaluePr(>|t|)-1.773e+035.648e+02-3.1400.011936*income1.935e-022.386e-038.1121.98e-05***expense7.977e+006.116e-0113.0433.77e-07***level-5.126e+001.069e+00-4.7970.000978***road-2.214e+007.550e-01-2.9330.016676*air-5.129e+002.272e+00-2.2570.050398.railtran1.495e-024.613e-033.2410.010144*airtran2.603e-013.323e-027.8322.62e-05***只有air一项在a=0.05的情况下是不能通过检验的,若排除air,则:Coefficients:EstimateStd.ErrortvaluePr(>|t|)(Intercept)-2.450e+035.683e+02-4.3100.00154**income1.834e-022.782e-036.5936.13e-05***expense7.465e+006.742e-0111.0726.21e-07***level-5.389e+001.261e+00-4.2730.00163**road-2.381e+008.921e-01-2.6690.02355*railtran1.933e-024.970e-033.8890.00301**airtran2.451e-013.864e-026.3438.42e-05***所有回归系数通过检验,回归模型初步确立。4、回归诊断计算得出残差,进行W正态性检验,得到p-value=0.9066,不能拒绝正态性假设。而回归值与标准化残差的残差图为:pepulepepule电20004000600080001000012000从图中也可看出,残差分布均匀且无规律,所以线性回归的基本假设满足,且没有自相关性。而再看:「7011「7011「7011「7011Resid7SFittedcfl-rnnpwENormalQ-Q3-l2np-30」P0z-P」fi3pul31a)C15-1-2-1Th&oreticalQuantilesIm(travel-income+expense+■level■+road+railtran+airIran)Scale-LocalionP-I2np-3O」poN-p」Scale-LocalionP-I2np-3O」poN-p」(T3puraFittedvaluesIm(travel-income+expense+level■+road+railtran+airIran)Cooktdistance15UDaoUEfpw-^000upo1011151015Obs.numbIm(travel-income+expense+■level■+road+railtran+airIran)综合看上面四幅图,11和15号观测值可能为强影响值,但产生原因还需要探究,可能是统计过程上的,亦可能是分析方法上的,去掉后回归效果减弱,所以暂不剔除。再检验多重共线性,kappa=1346.411〉1000,所以存在多重共线性,接近零的特征值及其相应特征向量为:0.004087919,[,6][1,]0.74512169[2,]0.07020978[3,]-0.60233849[4,]0.13346499[5,]-0.14256057[6,]-0.197271830.005567391[,5][1,]-0.264478984[2,]0.115775260[3,]-0.550564160[4,]0.004567634[5,]-0.073879174[6,]0.779773728可见,1,3,6之间即income与level,airtran之间可能存在严重的多重共线性关系,更可能的是在income与level之间,这在经济意义上也可以理解,国民收入越高,消费水平越高,而坐飞机的人才越多,前两者关系更直接。所以引起原因可能是有多余的自变量,分别去掉income,level,airtran做回归,并计算kappa值。从结果知,不管去掉哪一个,kappa值均减少一半左右,而只有去掉level时,回归方程几乎无影响,Coefficients:(Intercept)EstimateStd.ErrortvaluePr(〉|t|)-3.824e+037.511e+02-5.0910.000349***income1.217e-023.811e-033.1940.008552**expense5.483e+007.843e-016.9912.3e-05***road-4.247e+001.247e+00-3.4070.005855**railtran2.708e-027.416e-033.6510.003811**airtran1.929e-015.876e-023.2840.007288**Signif.codes:0‘***'0.001‘**'0.01‘*'0.05‘.'0.1‘'1Residualstandarderror:155.7on11degreesoffreedomMultipleR-squared:0.9985,AdjustedR-squared:0.9978F-statistic:1450on5and11DF,p-value:4.078e-15所以可以剔除level。再做一下异方差性的检验,用等级相关系数法,计算残差的绝对值与自变量间的等级相关系数,分别为0.2156863,0.05637255,0.2156863,0,0.2156863发现并无相关的,所以模型拟合良好。5、模型确立
Travel=-3.824e+03+1.217e-02*income+5.483*expense-4.247*road+2.708e-02*railtran+1.929e-01*airtran五、模型评注从模型来看,国内旅游消费量可由国民收入、人均旅游花费、铁路客运量、民航客运量、公路里程来建模模拟预测,这与实际意义相符。前两者可归纳为人民生活水平,后三者是国家交通建设方面,而恰恰包括了公路、铁路、航空三个方面。所以回归方程的建立与其实际意义大致相符,影响因素也基本确定。但是受开始自变量选择的影响,有可能存在重要变量为选入。六、程序代码及输出(编程语言:R)x=read.csv("数据.csv",head=T)a=x[,2:13]plot(a)^raH0HH00E0EEE柠H0HHs00H0o0a仙细aEHH0a0H0H00sBTaHBlarasaBB顶u」s0H000HEHsashYrfTTmTHInTTTMTtrniT^-0oo^raH0HH00E0EEE柠H0HHs00H0o0a仙细aEHH0a0H0H00sBTaHBlarasaBB顶u」s0H000HEHsashYrfTTmTHInTTTMTtrniT^-0oo启§Ag0OOQ0-000^-D貝1000001&000HHHH0HH0HHTEOOO1001000000/*相关性检验*/>cor.test(road,travel)/*相关性检验*/Pearson'sproduct—momentcorrelationdata:roadandtravelt=10.0692,df=15,p-value=4.563e-08alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:0.82099800.9761007sampleestimates:cor0.9333393>cor.test(shiptran,travel)Pearson'sproduct-momentcorrelationdata:shiptranandtravelt=0.0021,df=15,p-value=0.9983alternativehypothesis:truecorrelationisnotequalto095percentconfidenceinterval:-0.48022170.4810676sampleestimates:cor0.0005500457>model=lm(travel~income+number+expense+level+road+rail+air+railtran+roadtran+airtran)>summary(model)/*建立回归模型*/Call:lm(formula=travel~income+number+expense+level+road+rail+air+railtran+roadtran+airtran)Residuals:Min1QMedian3QMax-72.549-44.8603.56244.80690.603Coefficients:EstimateStd.ErrortvaluePr(>|t|)(Intercept)-5.972e+033.193e+03-1.8700.110617income2.151e-024.779e-034.5010.004100**number1.039e+001.446e+000.7190.499354expense6.805e+001.124e+006.0520.000922**level-5.815e+001.261e+00-4.6100.003653**road-1.468e+001.019e+00-1.4410.199608rail6.274e+024.462e+021.4060.209292air-4.155e+002.790e+00-1.4900.186935railtran2.524e-028.492e-032.9720.024903*roadtran-4.093e-044.554e-04-0.8990.403410airtran1.058e-011.272e-010.8320.43732711110.1Signif.codes:0‘***'0.001‘**'0.01‘*'0.050.1Residualstandarderror:84.55on6degreesoffreedomMultipleR-squared:0.9998,AdjustedR-squared:0.9994F-statistic:2462on10and6DF,p-value:5.061e-10>model1=step(model)/*逐步回归*/Start:AIC=155.17travel~income+number+expense+level+road+rail+air+railtran+roadtran+airtranDfSumofSqRSSAIC-number1369346589154.57-airtran1494847844155.02<none>42897155.17-roadtran1577548671155.31-rail11413757033158.01-road11485057746158.22-air11586258758158.52-railtran163136106033168.55-income1144834187731178.26-level1151949194845178.90-expense1261858304755186.50Step:AIC=154.57travel~income+expense+level+road+rail+air+railtran+roadtran+airtranDfSumofSqRSSAIC-roadtran1327649866153.73<none>46589154.57-rail11173558325156.39-air11565762246157.50-road11700963598157.86-airtran158169104758166.34-railtran164855111444167.40-income1148468195057176.91-level1163524210114178.18-expense1353482400071189.12Step:AIC=153.73travel~income+expense+level+road+rail+air+railtran+airtranDfSumofSqRSSAIC<none>49866153.73-rail11038060246154.94-air11288662752155.63-road11524165107156.26-railtran162438112303165.53-airtran186215136081168.79-level1164237214103176.50-expense1351763401629187.19-income1416943466809189.75>summary(model1)Call:lm(formula=travel~income+expense+level+road+rail+air+railtran+airtran)Residuals:Min1QMedian3QMax-66.673-57.7662.79646.74991.039Coefficients:(Intercept)EstimateStd.ErrortvaluePr(>|t|)-4.393e+032.102e+03-2.0900.070022.income1.898e-022.320e-038.1793.72e-05***expense7.038e+009.369e-017.5126.85e-05***level-5.427e+001.057e+00-5.1330.000893***road-1.460e+009.339e-01-1.5640.156518rail3.697e+022.865e+021.2900.232935air-3.589e+002.496e+00-1.4380.188431railtran2.166e-026.843e-033.1650.013295*airtran2.032e-015.464e-023.7190.005879**Signif.codes:0‘***'0.001‘**'0.01‘*'0.05‘.'0.1‘'1Residualstandarderror:78.95on8degreesoffreedomMultipleR-squared:0.9997,AdjustedR-squared:0.9994F-statistic:3529on8and8DF,p-value:2.252e-13>model2=drop1(model1)/*减少一个变量做回归*/>model2SingletermdeletionsModel:travel~income+expense+level+road+rail+air+railtran+airtranDfSumofSqRSSAIC<none>49866153.73income1416943466809189.75expense1351763401629187.19level1164237214103176.50road11524165107156.26rail11038060246154.94air11288662752155.63railtran162438112303165.53airtran186215136081168.79model3=update(modell,.~.-rail)/*剔除rail*/summary(model3)Call:lm(formula=travel~income+expense+level+road+air+railtran+airtran)Residuals:Min1QMedian3QMax-77.120-62.739-7.68257.07396.157Coefficients:(Intercept)EstimateStd.ErrortvaluePr(>|t|)-1.773e+035.648e+02-3.1400.011936*income1.935e-022.386e-038.1121.98e-05***expense7.977e+006.116e-0113.0433.77e-07***level-5.126e+001.069e+00-4.7970.000978***road-2.214e+007.550e-01-2.9330.016676*air-5.129e+002.272e+00-2.2570.050398.railtran1.495e-024.613e-033.2410.010144*airtran2.603e-013.323e-027.8322.62e-05***Signif.codes:0‘***'0.001‘**'0.01‘*'0.05‘.'0.1‘'1Residualstandarderror:81.82on9degreesoffreedomMultipleR-squared:0.9997,AdjustedR-squared:0.9994F-statistic:3756on7and9DF,p-value:7.348e-15model4=update(model3,.~.-air)summary(model4)Call:lm(formula=travel~income+expense+level+road+railtran+airtran)
Residuals:Min1QMedian3QMax-165.78-44.4312.8649.24123.92Coefficients:EstimateStd.ErrortvaluePr(>|t|)(Intercept)-2.450e+035.683e+02-4.3100.00154**income1.834e-022.782e-036.5936.13e-05***expense7.465e+006.742e-0111.0726.21e-07***level-5.389e+001.261e+00-4.2730.00163**road-2.381e+008.921e-01-2.6690.02355*railtran1.933e-024.970e-033.8890.00301**airtran2.451e-013.864e-026.3438.42e-05***Signif.codes:0‘***'0.001‘**'0.01‘*'0.05‘.'0.1‘'1Residualstandarderror:97.14on10degreesoffreedomMultipleR-squared:0.9995,AdjustedR-squared:0.9991F-statistic:3108on6and10DF,p-value:9.282e-16>resid=resid(model4)>resid61234532.124983-8.71978212.857759-83.08903650.68695747.681664789101112-54.769913-28.532399123.92000880.382480-165.78294633.0082931314151617-28.870183-44.425769-112.19928996.48354949.243624>shapiro.test(resid)/*W正态性检验*/Shapiro-Wilknormalitytestdata:residW=0.9756,p-value=0.9066>y=predict(model4)>rstandard=rstandard(model4)>plot(y,rstandard)FittedvaluesFittedvalueslm(travel~income十expense+level十road+railtran十airtran)pepulepepule电20004000600080001000012000plot(model4,l)plot(model4,2)plot(model4,3)plot(model4,4)ResidualsvsFittedOSL00L090OS—S(unp_s3虫09oogOSL00L090OS—S(unp_s3虫09oog■'C—IS-°1120004000°1120004000600080001000012000NormalQ-Qs_(unpas_(unpa<D」P3NP」(UPU-2S-2-1012TheoreticalQuantileslm(travel-income+expense十level+road+railtran+airtran)S・L0・LSO-S(unp-S3S・L0・LSO-S(unp-S3」P3Z-P」PPU2S-P00Cook'sdistance15101510Obs.numberlm(trave-l-income十expense十level十「o日对+railtran十airtran)attach(x)aa=data.frame(travel,income,expense,level,road,raiItran,airtran)b=aa[,2:7]bb=cor(b)kappa(bb,exact=T)/*计算kappa值*/[1]1366.411eigen(bb)/*求解矩阵特征值及特征向量*/$values[1]5.5857780070.2846304690.0915242580.0284119570.005567391[6]0.004087919$vectors[,1][,2][,3][,4][,5][1,]-0.4203320-0.112070060.15936640.4002601-0.264478984[2,]-0.37471550.861240140.1616264-0.27091930.115775260[3,]-0.42105000.07658249-0.09816320.3758770-0.550564160[4,]-0.4080864-0.16947687-0.7878004-0.40777420.004567634[5,]-0.4033123-0.433640390.5637918-0.5528584-0.073879174[6,]-0.4200367-0.151902770.01877780.39138070.779773728[,6][1,]0.74512169[2,]0.07020978[3,]-0.60233849[4,]0.13346499[5,]-0.14256057[6,]-0.19727183bbb=kappa(cor(b[,colnames⑹!="level"]))/*去掉变量level后求kappa值*/bbb[1]529.9542kappa(cor(b[,colnames(b)!="income"]))[1]537.9962kappa(cor(b[,colnames(b)!="airtran"]))[1]624.6458summary(update(model4,.~.-level))Call:lm(formula=travel~income+expense+road+railtran+airtran)Residuals:Min1QMedian3QMax-322.63-58.04-11.6295.45214.69Coefficients:EstimateStd.ErrortvaluePr(>|t|)(Intercept)-3.824e+037.511e+02-5.0910.000349***income1.217e-023.811e-033.1940.008552**expense5.483e+007.843e-016.9912.3e-05***road-4.247e+001.247e+00-3.4070.005855**railtran2.708e-027.416e-033.6510.003811**airtran1.929e-015.876e-023.2840.007288**Signif.codes:0‘***'0.001‘**'0.01‘*'0.05‘.'0.1‘'1Residualstandarderror:155.7on11degreesoffreedomMultipleR-squared:0.9985,AdjustedR-squared:0.9978F-statistic:1450on5and11DF,p-value:4.078e-15>summary(update(model4,.~.-income))Call:lm(formula=travel~expense+level+road+railtran+airtran)Residuals:Min1QMedian3QMax-466.59-84.23-15.25150.26246.75Coefficients:EstimateStd.ErrortvaluePr(>|t|)(Intercept)-4.844e+039.639e+02-5.0250.000387***expense7.056e+001.480e+004.7670.000583***level-1.075e+002.377e+00-0.4520.659854road-4.336e+001.855e+00-2.3370.039350*railtran3.611e-029.409e-033.8380.002755**airtran3.690e-017.443e-024.9580.000430***Signif.codes:0‘***'0.001‘**'0.01‘*'0.05‘.'0.1‘'1Residualstandarderror:214.1on11degreesoffreedomMultipleR-squared:0.9971,AdjustedR-squared:0.9958F-statistic:765.5on5and11DF,p-value:1.357e-13>summary(update(model4,.~.-airtran))Call:lm(formula=travel~income+expense+level
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 员工工位调整安排
- 电影产业园租赁合同
- 珠宝行业合同归档方案
- 危险品事故应急法规
- 校园道路铺设工程合同
- 水产加工鱼塘施工合同模板
- 建筑绿化工程劳务分包合同
- 商业综合体合同
- 旅游发展专项资金管理办法
- 建筑智能化交货期承诺书范本
- 高中生物植物激素调节第一轮复习公开课教学设计与反思
- GB/T 2885.6-2008矿用窄轨车辆第6部分:材料车
- GB/T 1393-1987舷梯翻梯装置
- 《直线与圆锥曲线的综合问题》示范公开课教学课件【高中数学北师大】
- 人体衰老和抗衰老研究 课件
- 新城吾悦广场商业封顶仪式策划方案
- 《故都的秋》《荷塘月色》《我与地坛(节选)》群文阅读 导学案 统编版高中语文必修上册
- 桡骨远端骨折中医治疗培训课件
- 例说议题式课堂教学的模式课件
- 小学数学北师大三年级上册五周长围篱笆
- 25吨吊车参数表75734
评论
0/150
提交评论