




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、HW2Due Date: Nov. 23Part I: written assignment1.a) Compute the Information Gain for Gender, Car Type and Shirt Size.本题的class有两类;即C0和C1IC0,C1= I10,10=1inforgender(D)=1020 I6,4+1020 I4,6=1020 -610 log2610-410 log2410+1020 -610 log2610-410 log2410=0.971Gain(gender)= IC0,C1-inforgender(D)=1-0.971=0.029i
2、nforCarType(D)=420 I1,3+820 I8,0+820 I1,7=420-14 log214-34 log234+820 -18 log218-78 log278=0.3797Gain(CarType)= IC0,C1-inforgender(D)=1-0.3797=0.6203inforShirtSize(D)=520 I3,2+720 I3,4+420 I2,2+420 I2,2=520-35 log235-25 log225+720 -37 log237-47 log247+410 -24 log212-24 log212=0.9876Gain(shirtSize)=
3、IC0,C1-inforgender(D)=1-0.9876=0.0124b) Construct a decision tree with Information Gain. 由a知,CarType的information Gain最大,故本题应该选择CarType作为首要分裂属性。CarType的类别有Luxury family Sport(因全部属于C0类,此类无需再划分) 对Luxury进一步划分:IC0,C1= I1,7=0.5436 inforgender(D)=18 I1,0+78 I1,6=0+78 -17 log217-67 log267=0.5177Gain(gender)
4、= IC0,C1-inforgender(D)=0.5436-0.5177=0.0259 inforShirtSize(D)=28 I0,2+38 I0,3+28 I1,1+18 I0,2=0.25Gain(shirtSize)= IC0,C1-inforgender(D)=0.5436-0.25=0.2936故此处选择ShirtSize进行属性分裂。 对family进一步划分:IC0,C1= I1,3=0.811Gain(gender)= IC0,C1-inforgender(D)=0.811- I1,3=0Gain(shirtSize)= IC0,C1-inforgender(D)=0.8
5、11- 14 I1,0- 14 I0,1- 14 I0,1- 14 I0,1=0.811故此处选择ShirtSize进行属性分裂。 根据以上的计算可得本题的决策数如下:2. (a) Design a multilayer feed-forward neural network (one hidden layer) for the data set in Q1. Label the nodes in the input and output layers. 根据数据的属性特点易知输入层有8个节点,分别为:x1 Gender ( Gender = M: x1 = 1; Gender = F: x1
6、 = 0 )x2 Car Type = Sports ( Y = 1; N = 0)x3 Car Type = Family( Y = 1; N = 0)x4 Car Type = Luxury ( Y = 1; N = 0)x5 Shirt Size = Small ( Y = 1; N = 0)x6 Shirt Size = Medium ( Y = 1; N = 0)x7 Shirt Size = Large ( Y = 1; N = 0)x8 Shirt Size = Extra Large ( Y = 1; N = 0)隐藏层有三个节点x9、x10和x11. 输出为二类问题, 因此只
7、有1个节点x12(C0=1;C2=0). 神经网络图如下:(其中Wij表示输入层第i个节点到隐藏层第j个节点所付权重,为方便计算,第i个节点到第9/10/11个节点的权重设置一样;Wi-j则表示隐藏层第i个节点到输出层节点所赋予的权重 )c) Using the neural network obtained above, show the weight values after one iteration of the back propagation algorithm, given the training instance “(M, Family, Small)". Indi
8、cate your initial weight values and biases and the learning rate used.对于 (M, Family, Small), 其类标号为C0, 其训练元祖为1, 0, 1, 0, 1, 0, 0, 0.表 1初始输入、权重、偏倚值和学习率X1X2X3X4X5X6X7X8W1jW2jW3jW4j101010000.10.20.10.2W5jW6jW7jW8jW9-12W10-12W11-129101112L0.10.20.3-0.10.10.2-0.10.10.1-0.10.20.9表 2净输入和净输出计算单元j净输入Ij净输出Oj91
9、*0.1+1*0.1+1*0.1+0.1=0.41+(1+e-0.4)=0.51101*0.1+1*0.1+1*0.1+0.1=0.41+(1+e-0.4)=0.51111*0.1+1*0.1+1*0.1-0.1=0.21+(1+e-0.2)=0.78120.51*0.1+0.51*0.2-0.78*0.1=0.0751+(1+e-0.075)=0.92表 3每个节点误差的计算单元jErrj120.92*(1-0.92) *(1-0.92)=0.0059110.78*(1-0.78)* 0.0059*(-0.1)=-0.00014100.51*(1-0.51)* 0.0059*(0.2)=0.
10、0002990.51*(1-0.51)* 0.0059*(0.1)=0.00016表 4权重和偏差更新计算权重或偏差新值W190.1+0.9*0.00016*1=0.1W1100.1+0.9*0.00029*1=0.1W1110.1+0.9*(-0.00014)*1=0.1W290.2+0.9*0.00016*0=0.2W2100.2+0.9*0.00029*0=0.2W2110.2+0.9*(-0.00014)*0=0.2W390.1+0.9*0.00016*1=0.1W3100.1+0.9*0.00029*1=0.1W3110.1+0.9*(-0.00014)*1=0.1W490.2+0.
11、9*0.00016*0=0.2W4100.2+0.9*0.00029*0=0.2W4110.2+0.9*(-0.00014)*0=0.2W590.1+0.9*0.00016*1=0.1W5100.1+0.9*0.00029*1=0.1W51110.1+0.9*(-0.00014)*1=0.1W690.2+0.9*0.00016*0=0.2W6100.2+0.9*0.00029*0=0.2W6110.2+0.9*(-0.00014)*0=0.2W790.3+0.9*0.00016*0=0.3W7100.3+0.9*0.00029*0=0.3W7110.3+0.9*(-0.00014)*0=0.3
12、W89-0.1+0.9*0.00016*0=-0.1W810-0.1+0.9*0.00029*0=-0.1W811-0.1+0.9*(-0.00014)*0=-0.1W9120.1+0.9*0.0059*0.51=0.103W10120.2+0.9*0.0059*0.51=0.203W1112-0.1+0.9*0.0059*0.78=-0.0960.1+0.9*0.00016=0.10.1+0.9*0.00029=0.1-0.1+0.9*(-0.00014)=-0.10.2+0.9*0.0059=0.23.a) Suppose the fraction of undergraduate stu
13、dents who smoke is 15% and the fraction of graduate students who smoke is 23%. If one-fth of the college students are graduate students and the rest are undergraduates, what is the probability that a student who smokes is a graduate student?U for Undergraduate student, G for Graduate student. and S
14、for Smoking则,PS|U=0.15, PS|G=0.23, PG=0.2, PU=0.8.故 PG|S=PS|G×PGpS=PS|G×PGPS|U× PU+PS|G×PG=0.23×0.20.15×0.8+0.23×0.2=0.277.b) Given the information in part (a), is a randomly chosen college student more likely to be a graduate or undergraduate student?因为PU>PG 故
15、Undergraduate student,c) Suppose 30% of the graduate students live in a dorm but only 10% of the undergraduate students live in a dorm. If a student smokes and lives in the dorm, is he or she more likely to be a graduate or undergraduate student? You can assume independence between students who live
16、 in a dorm and those who smoke.令D for Dorm.PD|U=0.1, PD|G=0.3.PG|DS×PDS=PDS|G×PG=PD|G×PS|G×PG=0.3×0.23×0.2=0.0138.PU|DS×PDS=PDS|U×PU=PD|U×PS|U×PU=0.1×0.15×0.8=0.012.因为PG|DS×PDS> PU|DS×PDS,所以PG|DS>PU|DS, 所以更可能是graduate studen
17、t.4.(a) The three cluster center after the first round execution第一轮:center A1(4,2,5) B1(1,1,1) C1(11,9,2)表格 1各点与原始中心点距离A1A2A3B1B2B3C1C2C3C441051231119525713694165281292677piA17.35 5.92 3.74 5.74 3.74 5.48 4.58 piB19.90 10.05 2.45 9.64 5.83 10.00 8.77 piC14.12 8.72 10.82 11.05 11.87 9.64 8.37 判断各点与中心
18、点的距离(A1在表格中的点表示为(A4,A5,A6),piA1表示各点到A1点的距离,piB1表示各点到B1点的距离,piC1表示各点到C1点的距离,下同) 由以上表格可知:Cluster1: A1 A3 B3 C3 C4Cluster2: B2 B1Cluster3: C1 A2(b) The final three clusters第二轮:计算每簇的均值。Cluster1: M1(5.2, 4.4, 7.2 ) Cluster2: M2(1.5, 2, 1.5)Cluster3: M3(10.5, 7, 2) 各点到簇中心点的距离:表格 2各点与第一次聚类中心点距离A1A2A3B1B2B3
19、C1C2C3C441051231119525713694165281292677piM13.47 7.10 2.73 8.22 6.26 3.26 9.05 4.39 5.10 1.62 piM24.30 9.03 8.92 1.22 1.22 8.63 11.81 4.95 9.35 7.65 piM38.73 2.06 8.14 11.28 9.39 10.31 2.06 10.74 7.95 7.50 再次聚类后的类簇为: Cluster1: A1 A3 B3 C3 C4Cluster2: B2 B1Cluster3: C1 A2结果分析:第二轮聚类结果与第一轮一致,故算法停止 Part
20、 II: LabQuestion 1 1. Build a decision tree using data set “transactions” that predicts milk as a function of the other fields. Set the “type” of each field to “Flag”, set the “direction” of “milk” as “out”, set the “type” of COD as “Typeless”, select “Expert” and set the “pruning severity” to 65, a
21、nd set the “minimum records per child branch” to be 95. Hand-in: A figure showing your tree.2. Use the model (the full tree generated by Clementine in step 1 above) to make a prediction for each of the 20 customers in the “rollout” data to determine whether the customer would buy milk. Hand-in: your
22、 prediction for each of the 20 customers.由程序运行的结果可知:customer(2,3,4,5,9,10,13,14,17,18) 会购买Milk。3. Hand-in: rules for positive (yes) prediction of milk purchase identified from the decision tree (up to the fifth level. The root is considered as level 1). Compare with the rules generated by Apriori in
23、 Homework 1, and submit your brief comments on the rules (e.g., pruning effect)利用决策树产生的关联规则: Table 1 决策树产生的关联规则ConsequentAntecedent1Antecedent2milkJuicemilkJuicewatermilkpastamilkJuicepastamilkTomato sourcemilkJuiceTomato sourcemilkbiscuitsmilkJuicebiscuitsmilkYoghurtmilkYoghurt watermilkYoghurtbisc
24、uitsmilkBriochesmilkYoghurtBriochesmilkbeermilkbeerbiscuitsmilkricemilkbeerricemilkFrozen vegetablesmilkFrozen vegetablesbiscuitsTable 2 Apriori产生的关联规则可以说决策树产生的关联规则和Apriori产生的关联规则是相似的。在决策树中少了部分规则是因为这些规则在第六以及第七层以下,被剪枝。Question 2: Churn Management1. Perform decision tree classification on training data set. Select all the input variables except state, area_code, and phone_number (since they are only informative for this analysis). Set the “Direction” of class as “out”, “type” as “Flag”. Then, specify the “minimum records pe
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 工程项目居间服务合同
- 股份制文书关于企业股权结构调整方案
- 新农业科技成果转化实施方案
- 影视制作行业数字化制作与发行平台开发方案
- 珠宝首饰行业数字化管理与展示技术方案
- 土地复垦工程项目合同
- 金华浙江金华义乌市公证处招聘工作人员笔试历年参考题库附带答案详解
- 金华2025年浙江金华武义县教育系统招聘教师37人笔试历年参考题库附带答案详解
- 眉山2025年四川眉山职业技术学院编制内招聘辅导员5人笔试历年参考题库附带答案详解
- 淮北安徽淮北濉溪县城市管理局招聘城市管理协管员30人笔试历年参考题库附带答案详解
- GB/T 1839-2008钢产品镀锌层质量试验方法
- 正弦交流电的基本特征与三要素
- 教教技术cccp四种教练能力与技巧课件
- 信息时代的地理学与人文地理学创新
- 建筑安全员A证考试题库附答案
- 【教学课件】鸽巢问题整理和复习示范教学课件
- DB1410-T 129-2022园林植物常见病虫害防治技术规范
- 男性乳腺发育护理查房课件
- ZF转向机安装及调整说明教学文稿
- 住房公积金经办人(专管员)登记申请表(2022新版)
- 如何唤醒孩子的内驱力PPT课件
评论
0/150
提交评论