版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
命题方式:单独命题
佛山科学技术学院2008-2009学年第一学期
《数据分析》课程期末考试试题A卷
专业、班级:姓名:学号:
题号—•~~三四五六七八九十十二总成绩
得分
说明:1.请仔细阅读题目,按要求在SAS软件系统编程运算:
2.将SAS程序及运算的有关结果作为解答copy到试卷的后面.
一、(12分)有关SAS的简答题:
1、SAS所采用的Windows操作系统中,SAS界面有哪三个部分?
日志框,编辑框,输出窗口
2、怎样输入非数值变量?
在非数值变量后面加“$”
3、与固定格式不同的自由格式输入数据应加上何种标记?
加上”@@”
4、写出三均值的计算公式。
入111
二、(15分)北京市GDP同比增长1978〜1995年的数据如下:
100.00107.57112.4296.21121.58107.21117.16116.19101.37
109.78112.83104.37105.40109.50111.60112.10113.50112.40
(D计算均值、方差、标准差、变异系数、偏度、峰度;
(2)计算中位数,上、下四分位数,四分位极差;
(3)做出直方图、QQ图、茎叶图、箱线图;
(4)进行正态性W检验(取a=0.05).
共3页第1页
三、(15分)已知数据如下:
X1x2x3x4(1)计算协方差矩阵,Pearson相关矩阵;
16.726.76.435.0(2)分析各指标间的相关性(取a=(H0)
18.228.03.229.7
16.726.72.134.9
18.126.74.331.5
16.726.03.032.7
18.130.27.034.9
20.230.54.834.4
20.229.55.536.2
21.531.55.836.5
18.830.65.435.4
21.627.85.434.1
21.329.55.835.8
四、(15分)已知某工厂产量y及工人数xl、
成本x2的有关数据如下:
序号yxlx2(1)求回归方程,给出各参数的实际解
11692653782释;
281983008(2)求出方差分析、参数估计的结果;
31923302450
41161952137
555532560
61622742450
71201803254
82233753802
91312052838
1067862347
五、(13分)已知数据如下:
X1x2x3x4x5x6x7
12.516.416.722.829.33.01726.6
7.89.910.212.617.60.84110.6
13.410.99.910.913.91.77217.8
19.119.819.029.739.62.44935.8
8.09.88.911.916.20.78913.7
9.74.24.24.66.50.8743.9
0.60.70.70.81.10.0561.0
13.99.49.39.813.32.12617.1
9.111.39.512.216.41.32711.6
对以上样本进行主成分分析,并求出相应的主成分.
共3页第2页
六、(15分)已知数据如下:
序号类别X1x2x3x4x5x6x7
36.057.137.7516.6711.682.3812.88
37.697.018.9416.1511.080.8311.67
38.696.018.8214.7911.441.7413.23
37.759.618.4913.159.761.2811.28
35.718.048.3115.137.761.4113.25
39.778.4912.9419.2711.052.0413.29
40.917.328.9417.6012.751.1414.80
33.707.5910.9818.8214.731.7810.10
35.024.726.2810.037.151.9310.39
52.417.709.9812.5311.702.3114.69
52.653.849.1613.0315.261.9814.57
55.855.507.459.559.522.2116.30
44.687.3214.5117.1312.081.2611.57
45.797.6610.3616.5612.862.7511.69
50.3711.3513.3019.2514.592.7514.87
64.348.0022.2220.0615.120.7222.89
(1)求出三个协方差矩阵;
(2)用距离判别求出线性判别函数,用交叉确认法计算误判率;
(3)判别待判样品属于哪一类.
七、(15分)利用上一题的数据(共16个)进行聚类分析:
(1)最短距离法,写出聚类过程,画出谱系图(取nclusters=4);
(2)最长距离法,写出聚类过程,画出谱系图(取nclusters=4),求出四个聚类
统计量;
(3)快’速聚类法分3类的结果,在平面坐标系中画出分类图.
共3页第3页
-(1)SAS界面包括
输出框,日志框,编辑器
(2)在非数值变量后面家上“$”符号.
(3)自由格式输入数据应加上“@@”标记.
A
(4)三均值的计算公式M=1/4Q1+1/2M+1/4Q3
程序:
datat1;
inputx@@;
cards;
100.00107.57112.4296.21121.58107.21117.16116.19101.37
109.78112.83104.37105.40109.50111.60112.1113.50112.40
procunivariateplotnormal;
run;
proccapabilitygraphicsnormal;
histogramx/normal;
qqplotx/normal(....);
run;
N18权重总和18
均值109.510556观测总和1971.19
标准偏差6.36948929方差40.5703938
偏度-0.3324812峰度0.05978054
未校平方和216555.809校正平方和689.696694
变异系数5.81632451标准误差均值1.50130302
(1)由上图可知道
均值:109.510556方差:40.5703938
变异系数:5.81632451峰度:0.05978054
偏度:-0.3324812
(2)
中位数:巴库数110.69Q0J
上四分位数:IZ5O3112.831
下四分位数:幽Qi项画1
四分位极差:屿分位被差7.43000|
(3)做出直方图、QQ图、茎叶图、箱线图
直方图:
QQ图
25-
20-
15-
10-
05-
oo-
95'
-2-1012
茎叶图:
茎叶#
1221
11672
11002222348
105783
100143
981
——+------+-------+-------+
茎・叶乘以10**+1
箱线图:
一一最小值——
值观测
104.3712
105.4013
盒形图
I
+---+
(4)进行正态性w检验(取a=0.05).
检验——统计量-----------P值-------
w
pr<D
Shapiro-WiIkW0.978265pr>0.9304
KoImogorov-SmirnovD0.128559>0.1500
pr>W-sq
Cramer-vonMisesW-Sq0.044882pr>sq>0.2500
Anderson-DarIingA-Sq0.247567A->0.2500
由上图可以知道Wo=0.978265,P=0.9304>a=0.05;
故不能拒绝原假设Ho,所以是高度显著的。
三
datat2;
inputxl-x4;
cards
16.726.76.435.0
18.228.03.229.7
16.726.72.134.9
18.126.74.331.5
16.726.03.032.7
18.130.27.034.9
20.230.54.834.4
20.229.55.536.2
21.531.55.836.5
18.830.65.435.4
21.627.85.434.1
21.329.55.835.8
proccorrcovpearson;
run;
(1)计算协方差矩阵,Pearson相关矩阵;
协方差矩阵:
蚀方差矩阵,自由度二11
xlx2x3x4
xl3.5771969702.3332575761.2264393941.542196970
x22.3332575763.5244696971.5731060612.067348485
x31.2264393941.5731060612.1681060611.643257576
x41.5421969702.0673484851.6432575764.064469697
Pearson相关矩阵:
xlx2x3x4
xl1.000000.657120.440390.40445
0.02020.15190.1922
x20.657121.000000.569080.54622
0.02020.05350.0662
x30.440390.569081.000000.55356
0.15190.05350.0619
x40.404450.546220.553561.00000
0.19220.06620.0619
(2)分析各指标间的相关性(取a=0.10)
山Pearson相关矩阵的上三角矩阵看出rl3,rl4都大于0.10
故这些向量的相关性不是很强。
四:
data14;
inputnum$yxlx2;
cardsr
11692653782
281983008
31923302450
41161952137
555532560
61622742450
71201803254
82233753802
91312052838
1067862347
procregdata=t4;
modely=xl-x2/i;
run;
(1)求回归方程,给出各参数的实际解释
Parameter
VariableDFEstimate
Intercept14.14260
xl10.49482
x210.00890
由上图可以知道
000890
8)=4.14260,jgi=0.49482,j^^-
回归方程为y=4.14260+0.49482x1+0.00890x2;
工厂产量y及工人数xl、成本X2的有关数据如下
Bo为基本产量,当成本x2固定时,工人数xl每增加•个单位,产量y就增加0.49482个
单位,同理当成本xl固定时,成本x2每增加•个单位,产量y就增加0.00890个单位。
(2)求出方差分析、参数估计的结果
方差分析:
Analysisotvariance
SumofMean
SourceDFSquaresSquareFValuePr>F
Model227272136362935.52<.0001
Error732.516074.64515
CorrectedTotal927304
由方差分析图可以知道
32=4.64515
R2=SSM/SST=27272/27304=0.9988
F值为2935.52
参数估计
ParameterEstimates
ParameterStandard
VariableDFEstimateErrortValuePr>HI
Intercept14.142603.555111.170.2821
xl10.494820.0073467.43<.0001
x210.008900.001336.700.0003
第五题:
datat5;
inputxl-x7;
cards;
12.516.416.722.829.33.01726.6
7.89.910.212.617.60.84110.6
13.410.99.910.913.91.77217.8
19.119.819.029.739.62.44935.8
8.09.88.911.916.20.78913.7
9.74.24.24.66.50.8743.9
0.60.70.70.81.10.0561.0
13.99.49.39.813.32.12617.1
9.111.39.512.216.41.32711.6
procprincomp;
run;
EigenvaluesoftheCorrelationMatrix
EigenvalueDifferenceProportionCumu1ative
16.368806955.970882200.90980.9098
20.397924750.237540340.05680.9667
30.160384420.114957090.02290.9896
40,045427330.023012480.00650.9961
50.022414850.017666030.00320.9993
60.004748820.004455930.00071.0000
70.000292890.00001.0000
Eigenvectors
PrinlPrin2Prin3Prin4Prin5PrinGPrin7
0.3488240.6123630.6820500.1332460.136972-.013602-.037959
0.390078-.1767270.0020060.456233-.5905200.5062310.058911
0.391810-.169297-.1106550.344580-.130939-.813353-.090308
0.385562-.3496340.020863-.1028180.4029360.226117-.710356
0.383622-.3754840.096918-.0477990.4649390.0779110.691324
0.3537200.549207-.7153370.0297170.2041800.1272680.052690
0.3894910.0160380.031998-.801013-.441775-.0927680.040273
特征值:
xl=6.36880695,x2=0.39792475,x3=0.16038442,x4=0.04542733,x5=0.02241485,x6=0.00474882.
X7=0.00029289;
ProportionCumulative
0.90980.9098
0.05680.9667
0.02290.9896
0.00650.9961
0.00320.9993
0.00071.0000
0.00001.0000
贡献率和累计贡献率分别为:
各主成分分别为:由于W1已经达到了90%所以第一主成分为
wl=0.348824X1+0.390078X2+0.391810X3+0.385562X4+0.383622X5+0.353720X6+0.389491x
7
六:
datat6;
inputxy$xl-x7;
cards;
136.057.137.7516.6711.682.3812.88
137.697.018.9416.1511.080.8311.67
138.696.018.8214.7911.441.7413.23
137.759.618.4913.159.761.2811.28
135.718.048.3115.137.761.4113.25
139.778.4912.9419.2711.052.0413.29
140.917.328.9417.6012.751.1414.80
133.707.5910.9818.8214.731.7810.10
135.024.726.2810.037.151.9310.39
252.417.709.9812.5311.702.3114.69
252.653.849.1613.0315.261.9814.57
255.855.507.459.559.522.2116.30
244.687.3214.5117.1312.081.2611.57
245.797.6610.3616.5612.862.7511.69
250.3711.3513.3019.2514.592.7514.87
datat61;
inputxl-x7;
cards;
64.348.0022.2220.0615.120.7222.89
procdiscrimdata=t6testdata=t61
out=al
outstat=a2outcross=a3
testout=a4method=normal
listcrosslisttestlistall;
classxy;
varxl-x7;
priorsequal;
run;
(1)求出三个协方差矩阵;
S!=
Variablexlx2x3x4x5xGX?
xl136.3561056-12.7039611-32.1020333-43.9701278-4.7449722-0.278977861,7896722
x2-12.703961147.685705632.455933347,69347229.33239441.9405222-0.4706611
x3-32.102033332.455933363.950133377.496000029.5911333-1.5631000-11.4411667
x4-43.970127847.693472277.4960000131.109872263.98726111.9098222-6.8091944
x5-4.74497229.332394429.591133363.987261165.85563891.1910111-1.2275389
xG-0.27897781.9405222-1.56310001.90982221.19101113.45262220.3389556
X?61.7996722-0.4706611-11.4411667-6.8091944-1.22753890.938955637,3281722
S2=
Variablexlx2x3x4x5xGx7
xl18.54121667-3.74661667-8.57356667-11.76273000-2.189896670.522380007.93868333
x2-3.746616676.374656674.282866676.663030000.830496670.63964000-0.64302333
x3-8.573566674.282866676.958386678.268320001.92554667-0.42338000-3.00631333
x4-11.762730006.663030008.2683200012.818710004.331510000.26400000-4.10899000
x5-2.169896670.830496671.925546674.331510004.328416670.20144000-0.75466333
x60.522380000.63964000-0.423380000.264000000.201440000.309720000.29376000
X?7.93868333-0.64302333-3.00631333-4.10899000-0.754663330.293760003.61457667
S=
Variablexlx2x3x4x5xGx7
xl10.48893120-0.97722778-2.48938718-3.38231752-0.36499786-0.021459834.75382094
x2-0.977227783.668131202.496610263.668728630.717876500.14927094-0.03620470
x3-2.469387182.496610264.919241035.361230772.27624103-0.12023846-0.88008974
x4-3.382317523.668728635.9612307710.085382484.322097010.14690940-0.52378419
x5-0.364997880.717876502.276241034.922097015.065818380.09161824-0.09442607
x6-0.021459830.14927094-0.120238460.146909400.091616240.265586320.07222735
x74.75382094-0.03620470-0.88008974-0.52378419-0.094426070.072227352.87139786
(2)用距离判别求出线性判别函数,用交叉确认法计算误判率;
LinearDiscriminantFundionforxy
Variable12
Constant-206.18758-382.57458
xl16.6024023.14210
x2-2.77150-3.89531
x3-5.80267-5.94472
x414.1735917.23215
x5-8.00073-10.19191
xG7.4917412.60276
x7-22.87514-32.83581
由上图可以知道线性判别函数为:
W!=-206.18758+16.6024x1-2.77150x2-5.80267x3+14.17359x4-8.00073x5+7.4917
4x6-22.87514x7
W2=-382.57458+23.14210x1-3.89531x2-5.94472x3+17.23215x4-10.19191x5+12.6
0276x6-32.83581x7
PosteriorProbabi1ityofMembershipinxy
FromClassified
Obsxyintoxy12
1111.00000.0000
2111.00000.0000
3111.00000.0000
4111.00000.0000
5111.00000.0000
612*0.00001.0000
7111.00000.0000
8111.00000.0000
9111.00000.0000
10220.00001.0000
11220.00001.0000
12220.00001.0000
1321*1.00000.0000
14220.00001.0000
15220.00001.0000
*Misclassifiedobservation
用交叉确认法计算误判率P=2/15=13.33%
(3)判别待判样品属于哪一类.
PosteriorProbabiIityofMembershipinxy
Classified
Obsintoxy12
120.00001.0000
判别待判样品属于2类
七、(15分)利用上一题的数据(共16个)进行聚类分析:
(1)最短距离法,写出聚类过程,画出谱系图(取nclusters=4);
(2)最长距离法,写出聚类过程,画出谱系图(取nclusters=4),求出四个聚类统
计量;
(3)快速聚类法分3类的结果,在平面坐标系中画出分类图.
datat7;
inputxl-x7;
cards;
36.057.137.7516.6711.682.3812.88
37.697.018.9416.1511.080.8311.67
38.696.018.8214.7911.441.7413.23
37.759.618.4913.159.761.2811.28
35.718.048.3115.137.761.4113.25
39.778.4912.9419.2711.052.0413.29
40.917.328.9417.6012.751.1414.80
33.707.5910.9818.8214.731.7810.10
35.024.726.2810.037.151.9310.39
52.417.709.9812.5311.702.3114.69
52.653.849.1613.0315.261.9814.57
55.855.507.459.559.522.2116.30
44.687.3214.5117.1312.081.2611.57
45.797.6610.3616.5612.862.7511.69
50.3711.3513.3019.2514.592.7514.87
64.348.0022.2220.0615.120.7222.89
procclusterdata=t7method=sinstdnonormouttree=treel;
varxl-x7;
run;
proctreedata=treelgraphicshorizontalout=clnclusters=4;
run;
procprintdata=cl;
run;
procclusterdata=t7method=comstdnonormouttree=tree2;
varxl-x7;
run;
proctreedata=treezgraphicshorizontalout=c2nclusters=4;
run;
procprintdata=c2;
run;
procfastclusmaxc=3distancelistcluster=c
data=t6out=d;
run;
procplot;
plotx2*xl=c;
run;
(1)最短距离法,写出聚类过程,画出谱系图(取nclusters=4);
ClusterHistory
Min
NCL-ClustersJoined—FREQDist
oB1onuQ2
15on2oD721.3976
14Dn—1.4581
o4oDb-2
13cB15oD1431.525
12cL12cU1—451.5721
11L—1.6783
o6oB.132
10cB11cL1371.8356
9LB—01.8609
cL9cL19
8—01.865
cL8oL.10
7cL7oB1211.9501
66oB—22.0097
cLo
5cL5oB1132.126
4B—
cL4oBQ42.6429
33ou52.707
cLB.15
2cL2oB1662.7151
1—5.0941
OBI----------------------------------1
0B3----------------------------------1
OBI4----------------------------------------
0B2-------------------------------------_
0B7-------------------------------------1
0B4----------------------------------------------
0B5--------------------------------------1
0B6----------------------------------------------
0B13----------------------------------------------
OB10-------------------------------------------------
0B12--------------------------------------------------
0B8------------------------------------------------------
0B11------------------------------------------------------------------
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 公共场所监控系统采购招标
- 房屋买卖合同第三方权益分析
- 医院药品采购合同的交货方式
- 鲜奶购销合同模板
- 消防技术研究与开发合同
- 拖拉机购销合同的签订主体
- 自建房交易合同样本
- 粗粮订购合同
- 房屋按揭贷款借款合同模板
- 购销合同买方权益保障措施
- 2024-2030年中国海上风力发电行业发展状况及投资策略规划分析报告
- (试卷)建瓯市2024-2025学年第一学期七年级期中质量监测
- 机耕道路维护方案
- 《安徽省二年级上学期数学期末试卷全套》
- 4.2 让家更美好(大单元教学设计) -2024-2025学年统编版道德与法治七年级上册
- 保安人员安全知识培训内容
- 山东省淄博市张店区2024-2025学年八年级上学期期中语文试题(含答案)
- 2023年质量员(土建质量专业管理实务)题库含答案(巩固)
- 国开(浙江)2024年秋《中国建筑史(本)》形考作业1-4答案
- 上海市普陀区2024-2025学年六年级(五四学制)上学期期中语文试题
- 2024新能源光伏电站运行规程和检修规程
评论
0/150
提交评论