《数据分析》课程期末考试试题A卷_第1页
《数据分析》课程期末考试试题A卷_第2页
《数据分析》课程期末考试试题A卷_第3页
《数据分析》课程期末考试试题A卷_第4页
《数据分析》课程期末考试试题A卷_第5页
已阅读5页,还剩11页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

命题方式:单独命题

佛山科学技术学院2008-2009学年第一学期

《数据分析》课程期末考试试题A卷

专业、班级:姓名:学号:

题号—•~~三四五六七八九十十二总成绩

得分

说明:1.请仔细阅读题目,按要求在SAS软件系统编程运算:

2.将SAS程序及运算的有关结果作为解答copy到试卷的后面.

一、(12分)有关SAS的简答题:

1、SAS所采用的Windows操作系统中,SAS界面有哪三个部分?

日志框,编辑框,输出窗口

2、怎样输入非数值变量?

在非数值变量后面加“$”

3、与固定格式不同的自由格式输入数据应加上何种标记?

加上”@@”

4、写出三均值的计算公式。

入111

二、(15分)北京市GDP同比增长1978〜1995年的数据如下:

100.00107.57112.4296.21121.58107.21117.16116.19101.37

109.78112.83104.37105.40109.50111.60112.10113.50112.40

(D计算均值、方差、标准差、变异系数、偏度、峰度;

(2)计算中位数,上、下四分位数,四分位极差;

(3)做出直方图、QQ图、茎叶图、箱线图;

(4)进行正态性W检验(取a=0.05).

共3页第1页

三、(15分)已知数据如下:

X1x2x3x4(1)计算协方差矩阵,Pearson相关矩阵;

16.726.76.435.0(2)分析各指标间的相关性(取a=(H0)

18.228.03.229.7

16.726.72.134.9

18.126.74.331.5

16.726.03.032.7

18.130.27.034.9

20.230.54.834.4

20.229.55.536.2

21.531.55.836.5

18.830.65.435.4

21.627.85.434.1

21.329.55.835.8

四、(15分)已知某工厂产量y及工人数xl、

成本x2的有关数据如下:

序号yxlx2(1)求回归方程,给出各参数的实际解

11692653782释;

281983008(2)求出方差分析、参数估计的结果;

31923302450

41161952137

555532560

61622742450

71201803254

82233753802

91312052838

1067862347

五、(13分)已知数据如下:

X1x2x3x4x5x6x7

12.516.416.722.829.33.01726.6

7.89.910.212.617.60.84110.6

13.410.99.910.913.91.77217.8

19.119.819.029.739.62.44935.8

8.09.88.911.916.20.78913.7

9.74.24.24.66.50.8743.9

0.60.70.70.81.10.0561.0

13.99.49.39.813.32.12617.1

9.111.39.512.216.41.32711.6

对以上样本进行主成分分析,并求出相应的主成分.

共3页第2页

六、(15分)已知数据如下:

序号类别X1x2x3x4x5x6x7

36.057.137.7516.6711.682.3812.88

37.697.018.9416.1511.080.8311.67

38.696.018.8214.7911.441.7413.23

37.759.618.4913.159.761.2811.28

35.718.048.3115.137.761.4113.25

39.778.4912.9419.2711.052.0413.29

40.917.328.9417.6012.751.1414.80

33.707.5910.9818.8214.731.7810.10

35.024.726.2810.037.151.9310.39

52.417.709.9812.5311.702.3114.69

52.653.849.1613.0315.261.9814.57

55.855.507.459.559.522.2116.30

44.687.3214.5117.1312.081.2611.57

45.797.6610.3616.5612.862.7511.69

50.3711.3513.3019.2514.592.7514.87

64.348.0022.2220.0615.120.7222.89

(1)求出三个协方差矩阵;

(2)用距离判别求出线性判别函数,用交叉确认法计算误判率;

(3)判别待判样品属于哪一类.

七、(15分)利用上一题的数据(共16个)进行聚类分析:

(1)最短距离法,写出聚类过程,画出谱系图(取nclusters=4);

(2)最长距离法,写出聚类过程,画出谱系图(取nclusters=4),求出四个聚类

统计量;

(3)快’速聚类法分3类的结果,在平面坐标系中画出分类图.

共3页第3页

-(1)SAS界面包括

输出框,日志框,编辑器

(2)在非数值变量后面家上“$”符号.

(3)自由格式输入数据应加上“@@”标记.

A

(4)三均值的计算公式M=1/4Q1+1/2M+1/4Q3

程序:

datat1;

inputx@@;

cards;

100.00107.57112.4296.21121.58107.21117.16116.19101.37

109.78112.83104.37105.40109.50111.60112.1113.50112.40

procunivariateplotnormal;

run;

proccapabilitygraphicsnormal;

histogramx/normal;

qqplotx/normal(....);

run;

N18权重总和18

均值109.510556观测总和1971.19

标准偏差6.36948929方差40.5703938

偏度-0.3324812峰度0.05978054

未校平方和216555.809校正平方和689.696694

变异系数5.81632451标准误差均值1.50130302

(1)由上图可知道

均值:109.510556方差:40.5703938

变异系数:5.81632451峰度:0.05978054

偏度:-0.3324812

(2)

中位数:巴库数110.69Q0J

上四分位数:IZ5O3112.831

下四分位数:幽Qi项画1

四分位极差:屿分位被差7.43000|

(3)做出直方图、QQ图、茎叶图、箱线图

直方图:

QQ图

25-

20-

15-

10-

05-

oo-

95'

-2-1012

茎叶图:

茎叶#

1221

11672

11002222348

105783

100143

981

——+------+-------+-------+

茎・叶乘以10**+1

箱线图:

一一最小值——

值观测

104.3712

105.4013

盒形图

I

+---+

(4)进行正态性w检验(取a=0.05).

检验——统计量-----------P值-------

w

pr<D

Shapiro-WiIkW0.978265pr>0.9304

KoImogorov-SmirnovD0.128559>0.1500

pr>W-sq

Cramer-vonMisesW-Sq0.044882pr>sq>0.2500

Anderson-DarIingA-Sq0.247567A->0.2500

由上图可以知道Wo=0.978265,P=0.9304>a=0.05;

故不能拒绝原假设Ho,所以是高度显著的。

datat2;

inputxl-x4;

cards

16.726.76.435.0

18.228.03.229.7

16.726.72.134.9

18.126.74.331.5

16.726.03.032.7

18.130.27.034.9

20.230.54.834.4

20.229.55.536.2

21.531.55.836.5

18.830.65.435.4

21.627.85.434.1

21.329.55.835.8

proccorrcovpearson;

run;

(1)计算协方差矩阵,Pearson相关矩阵;

协方差矩阵:

蚀方差矩阵,自由度二11

xlx2x3x4

xl3.5771969702.3332575761.2264393941.542196970

x22.3332575763.5244696971.5731060612.067348485

x31.2264393941.5731060612.1681060611.643257576

x41.5421969702.0673484851.6432575764.064469697

Pearson相关矩阵:

xlx2x3x4

xl1.000000.657120.440390.40445

0.02020.15190.1922

x20.657121.000000.569080.54622

0.02020.05350.0662

x30.440390.569081.000000.55356

0.15190.05350.0619

x40.404450.546220.553561.00000

0.19220.06620.0619

(2)分析各指标间的相关性(取a=0.10)

山Pearson相关矩阵的上三角矩阵看出rl3,rl4都大于0.10

故这些向量的相关性不是很强。

四:

data14;

inputnum$yxlx2;

cardsr

11692653782

281983008

31923302450

41161952137

555532560

61622742450

71201803254

82233753802

91312052838

1067862347

procregdata=t4;

modely=xl-x2/i;

run;

(1)求回归方程,给出各参数的实际解释

Parameter

VariableDFEstimate

Intercept14.14260

xl10.49482

x210.00890

由上图可以知道

000890

8)=4.14260,jgi=0.49482,j^^-

回归方程为y=4.14260+0.49482x1+0.00890x2;

工厂产量y及工人数xl、成本X2的有关数据如下

Bo为基本产量,当成本x2固定时,工人数xl每增加•个单位,产量y就增加0.49482个

单位,同理当成本xl固定时,成本x2每增加•个单位,产量y就增加0.00890个单位。

(2)求出方差分析、参数估计的结果

方差分析:

Analysisotvariance

SumofMean

SourceDFSquaresSquareFValuePr>F

Model227272136362935.52<.0001

Error732.516074.64515

CorrectedTotal927304

由方差分析图可以知道

32=4.64515

R2=SSM/SST=27272/27304=0.9988

F值为2935.52

参数估计

ParameterEstimates

ParameterStandard

VariableDFEstimateErrortValuePr>HI

Intercept14.142603.555111.170.2821

xl10.494820.0073467.43<.0001

x210.008900.001336.700.0003

第五题:

datat5;

inputxl-x7;

cards;

12.516.416.722.829.33.01726.6

7.89.910.212.617.60.84110.6

13.410.99.910.913.91.77217.8

19.119.819.029.739.62.44935.8

8.09.88.911.916.20.78913.7

9.74.24.24.66.50.8743.9

0.60.70.70.81.10.0561.0

13.99.49.39.813.32.12617.1

9.111.39.512.216.41.32711.6

procprincomp;

run;

EigenvaluesoftheCorrelationMatrix

EigenvalueDifferenceProportionCumu1ative

16.368806955.970882200.90980.9098

20.397924750.237540340.05680.9667

30.160384420.114957090.02290.9896

40,045427330.023012480.00650.9961

50.022414850.017666030.00320.9993

60.004748820.004455930.00071.0000

70.000292890.00001.0000

Eigenvectors

PrinlPrin2Prin3Prin4Prin5PrinGPrin7

0.3488240.6123630.6820500.1332460.136972-.013602-.037959

0.390078-.1767270.0020060.456233-.5905200.5062310.058911

0.391810-.169297-.1106550.344580-.130939-.813353-.090308

0.385562-.3496340.020863-.1028180.4029360.226117-.710356

0.383622-.3754840.096918-.0477990.4649390.0779110.691324

0.3537200.549207-.7153370.0297170.2041800.1272680.052690

0.3894910.0160380.031998-.801013-.441775-.0927680.040273

特征值:

xl=6.36880695,x2=0.39792475,x3=0.16038442,x4=0.04542733,x5=0.02241485,x6=0.00474882.

X7=0.00029289;

ProportionCumulative

0.90980.9098

0.05680.9667

0.02290.9896

0.00650.9961

0.00320.9993

0.00071.0000

0.00001.0000

贡献率和累计贡献率分别为:

各主成分分别为:由于W1已经达到了90%所以第一主成分为

wl=0.348824X1+0.390078X2+0.391810X3+0.385562X4+0.383622X5+0.353720X6+0.389491x

7

六:

datat6;

inputxy$xl-x7;

cards;

136.057.137.7516.6711.682.3812.88

137.697.018.9416.1511.080.8311.67

138.696.018.8214.7911.441.7413.23

137.759.618.4913.159.761.2811.28

135.718.048.3115.137.761.4113.25

139.778.4912.9419.2711.052.0413.29

140.917.328.9417.6012.751.1414.80

133.707.5910.9818.8214.731.7810.10

135.024.726.2810.037.151.9310.39

252.417.709.9812.5311.702.3114.69

252.653.849.1613.0315.261.9814.57

255.855.507.459.559.522.2116.30

244.687.3214.5117.1312.081.2611.57

245.797.6610.3616.5612.862.7511.69

250.3711.3513.3019.2514.592.7514.87

datat61;

inputxl-x7;

cards;

64.348.0022.2220.0615.120.7222.89

procdiscrimdata=t6testdata=t61

out=al

outstat=a2outcross=a3

testout=a4method=normal

listcrosslisttestlistall;

classxy;

varxl-x7;

priorsequal;

run;

(1)求出三个协方差矩阵;

S!=

Variablexlx2x3x4x5xGX?

xl136.3561056-12.7039611-32.1020333-43.9701278-4.7449722-0.278977861,7896722

x2-12.703961147.685705632.455933347,69347229.33239441.9405222-0.4706611

x3-32.102033332.455933363.950133377.496000029.5911333-1.5631000-11.4411667

x4-43.970127847.693472277.4960000131.109872263.98726111.9098222-6.8091944

x5-4.74497229.332394429.591133363.987261165.85563891.1910111-1.2275389

xG-0.27897781.9405222-1.56310001.90982221.19101113.45262220.3389556

X?61.7996722-0.4706611-11.4411667-6.8091944-1.22753890.938955637,3281722

S2=

Variablexlx2x3x4x5xGx7

xl18.54121667-3.74661667-8.57356667-11.76273000-2.189896670.522380007.93868333

x2-3.746616676.374656674.282866676.663030000.830496670.63964000-0.64302333

x3-8.573566674.282866676.958386678.268320001.92554667-0.42338000-3.00631333

x4-11.762730006.663030008.2683200012.818710004.331510000.26400000-4.10899000

x5-2.169896670.830496671.925546674.331510004.328416670.20144000-0.75466333

x60.522380000.63964000-0.423380000.264000000.201440000.309720000.29376000

X?7.93868333-0.64302333-3.00631333-4.10899000-0.754663330.293760003.61457667

S=

Variablexlx2x3x4x5xGx7

xl10.48893120-0.97722778-2.48938718-3.38231752-0.36499786-0.021459834.75382094

x2-0.977227783.668131202.496610263.668728630.717876500.14927094-0.03620470

x3-2.469387182.496610264.919241035.361230772.27624103-0.12023846-0.88008974

x4-3.382317523.668728635.9612307710.085382484.322097010.14690940-0.52378419

x5-0.364997880.717876502.276241034.922097015.065818380.09161824-0.09442607

x6-0.021459830.14927094-0.120238460.146909400.091616240.265586320.07222735

x74.75382094-0.03620470-0.88008974-0.52378419-0.094426070.072227352.87139786

(2)用距离判别求出线性判别函数,用交叉确认法计算误判率;

LinearDiscriminantFundionforxy

Variable12

Constant-206.18758-382.57458

xl16.6024023.14210

x2-2.77150-3.89531

x3-5.80267-5.94472

x414.1735917.23215

x5-8.00073-10.19191

xG7.4917412.60276

x7-22.87514-32.83581

由上图可以知道线性判别函数为:

W!=-206.18758+16.6024x1-2.77150x2-5.80267x3+14.17359x4-8.00073x5+7.4917

4x6-22.87514x7

W2=-382.57458+23.14210x1-3.89531x2-5.94472x3+17.23215x4-10.19191x5+12.6

0276x6-32.83581x7

PosteriorProbabi1ityofMembershipinxy

FromClassified

Obsxyintoxy12

1111.00000.0000

2111.00000.0000

3111.00000.0000

4111.00000.0000

5111.00000.0000

612*0.00001.0000

7111.00000.0000

8111.00000.0000

9111.00000.0000

10220.00001.0000

11220.00001.0000

12220.00001.0000

1321*1.00000.0000

14220.00001.0000

15220.00001.0000

*Misclassifiedobservation

用交叉确认法计算误判率P=2/15=13.33%

(3)判别待判样品属于哪一类.

PosteriorProbabiIityofMembershipinxy

Classified

Obsintoxy12

120.00001.0000

判别待判样品属于2类

七、(15分)利用上一题的数据(共16个)进行聚类分析:

(1)最短距离法,写出聚类过程,画出谱系图(取nclusters=4);

(2)最长距离法,写出聚类过程,画出谱系图(取nclusters=4),求出四个聚类统

计量;

(3)快速聚类法分3类的结果,在平面坐标系中画出分类图.

datat7;

inputxl-x7;

cards;

36.057.137.7516.6711.682.3812.88

37.697.018.9416.1511.080.8311.67

38.696.018.8214.7911.441.7413.23

37.759.618.4913.159.761.2811.28

35.718.048.3115.137.761.4113.25

39.778.4912.9419.2711.052.0413.29

40.917.328.9417.6012.751.1414.80

33.707.5910.9818.8214.731.7810.10

35.024.726.2810.037.151.9310.39

52.417.709.9812.5311.702.3114.69

52.653.849.1613.0315.261.9814.57

55.855.507.459.559.522.2116.30

44.687.3214.5117.1312.081.2611.57

45.797.6610.3616.5612.862.7511.69

50.3711.3513.3019.2514.592.7514.87

64.348.0022.2220.0615.120.7222.89

procclusterdata=t7method=sinstdnonormouttree=treel;

varxl-x7;

run;

proctreedata=treelgraphicshorizontalout=clnclusters=4;

run;

procprintdata=cl;

run;

procclusterdata=t7method=comstdnonormouttree=tree2;

varxl-x7;

run;

proctreedata=treezgraphicshorizontalout=c2nclusters=4;

run;

procprintdata=c2;

run;

procfastclusmaxc=3distancelistcluster=c

data=t6out=d;

run;

procplot;

plotx2*xl=c;

run;

(1)最短距离法,写出聚类过程,画出谱系图(取nclusters=4);

ClusterHistory

Min

NCL-ClustersJoined—FREQDist

oB1onuQ2

15on2oD721.3976

14Dn—1.4581

o4oDb-2

13cB15oD1431.525

12cL12cU1—451.5721

11L—1.6783

o6oB.132

10cB11cL1371.8356

9LB—01.8609

cL9cL19

8—01.865

cL8oL.10

7cL7oB1211.9501

66oB—22.0097

cLo

5cL5oB1132.126

4B—

cL4oBQ42.6429

33ou52.707

cLB.15

2cL2oB1662.7151

1—5.0941

OBI----------------------------------1

0B3----------------------------------1

OBI4----------------------------------------

0B2-------------------------------------_

0B7-------------------------------------1

0B4----------------------------------------------

0B5--------------------------------------1

0B6----------------------------------------------

0B13----------------------------------------------

OB10-------------------------------------------------

0B12--------------------------------------------------

0B8------------------------------------------------------

0B11------------------------------------------------------------------

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论