1、weka 数据分析实验1.实验简介借助工具 weka 3.6,对数据样本进行测试,分类测试方法包括:朴素贝叶斯、决策树、随机数三类,聚类测试方法包括:dbscan ,k均值两种;2.数据样本以熟悉数据分类的各类常用算法,以及了解weka 的使用方法为目的,本次试验中,采用的数据样本是weka软件自带的“vote”样本,如图:3.关联规则分析1)操作步骤:a)点击“ explorer”按钮,弹出“weka explorer”控制界面b)选择“ associate”选项卡;c)点击“ choose”按钮,选择“apriori ”规则d)点击参数文本框框,在参数选项卡设置参数如:e)点击左侧“ st
2、art”按钮2)执行结果:= run information = scheme: weka.associations.apriori -i -n 10 -t 0 -c 0.9 -d 0.05 -u 1.0 -m 0.5 -s -1.0 -c -1 relation: vote instances: 435 attributes: 17 handicapped-infants water-project-cost-sharing adoption-of-the-budget-resolution physician-fee-freeze el-salvador-aid religious-gro
3、ups-in-schools anti-satellite-test-ban aid-to-nicaraguan-contras mx-missile immigration synfuels-corporation-cutback education-spending superfund-right-to-sue crime duty-free-exports export-administration-act-south-africa class = associator model (full training set) = apriori = minimum support: 0.5
4、(218 instances) minimum metric : 0.9 number of cycles performed: 10 generated sets of large itemsets: size of set of large itemsetsl(1): 12 large itemsetsl(1): handicapped-infants=n 236 adoption-of-the-budget-resolution=y 253 physician-fee-freeze=n 247 religious-groups-in-schools=y 272 anti-satellit
5、e-test-ban=y 239 aid-to-nicaraguan-contras=y 242 synfuels-corporation-cutback=n 264 education-spending=n 233 crime=y 248 duty-free-exports=n 233 export-administration-act-south-africa=y 269 class=democrat 267 size of set of large itemsetsl(2): 4 large itemsetsl(2): adoption-of-the-budget-resolution=
6、y physician-fee-freeze=n 219 adoption-of-the-budget-resolution=y class=democrat 231 physician-fee-freeze=n class=democrat 245 aid-to-nicaraguan-contras=y class=democrat 218 size of set of large itemsetsl(3): 1 large itemsetsl(3): adoption-of-the-budget-resolution=y physician-fee-freeze=n class=democ
7、rat 219 best rules found: 1. adoption-of-the-budget-resolution=y physician-fee-freeze=n 219 = class=democrat 219 conf:(1) 2. physician-fee-freeze=n 247 = class=democrat 245 conf:(0.99) 3. adoption-of-the-budget-resolution=y class=democrat 231 = physician-fee-freeze=n 219 conf:(0.95) 4. class=democra
8、t 267 = physician-fee-freeze=n 245 conf:(0.92) 5. adoption-of-the-budget-resolution=y 253 = class=democrat 231 conf:(0.91) 6. aid-to-nicaraguan-contras=y 242 = class=democrat 218 conf:(0.9) 3)结果分析:a)该样本数据,数据记录数435 个, 17 个属性,进行了10 轮测试b)最小支持度为0.5,即至少需要218 个实例;c)最小置信度为0.9;d)进行了 10 轮搜索,频繁1 项集12 个,频繁2 项集
9、 4 个,频繁3 项集 1 个;4.分类算法 -随机树分析1)操作步骤:a)点击“ explorer”按钮,弹出“weka explorer”控制界面b)选择“ classify ”选项卡;c)点击“ choose”按钮,选择“trees”“randomtree”规则d)设置cross-validation 为 10 次e)点击左侧“ start”按钮2)执行结果:= run information = scheme:weka.classifiers.trees.randomtree -k 0 -m 1.0 -s 1 relation: vote instances:435 attribute
10、s:17 handicapped-infants water-project-cost-sharing adoption-of-the-budget-resolution physician-fee-freeze el-salvador-aid religious-groups-in-schools anti-satellite-test-ban aid-to-nicaraguan-contras mx-missile immigration synfuels-corporation-cutback education-spending superfund-right-to-sue crime
11、 duty-free-exports export-administration-act-south-africa class test mode:10-fold cross-validation = classifier model (full training set) = randomtree = el-salvador-aid = n | physician-fee-freeze = n | | duty-free-exports = n | | | anti-satellite-test-ban = n | | | | synfuels-corporation-cutback = n
12、 | | | | | crime = n : republican (0.96/0) | | | | | crime = y | | | | | | handicapped-infants = n : democrat (2.02/0.01) | | | | | | handicapped-infants = y : democrat (0.05/0) | | | | synfuels-corporation-cutback = y | | | | | handicapped-infants = n : democrat (0.79/0.01) | | | | | handicapped-in
13、fants = y : democrat (2.12/0) | | | anti-satellite-test-ban = y | | | | adoption-of-the-budget-resolution = n | | | | | handicapped-infants = n : democrat (1.26/0.01) | | | | | handicapped-infants = y : republican (1.25/0.25) | | | | adoption-of-the-budget-resolution = y | | | | | handicapped-infant
14、s = n | | | | | | crime = n : democrat (5.94/0.01) | | | | | | crime = y : democrat (5.15/0.12) | | | | | handicapped-infants = y : democrat (36.99/0.09) | | duty-free-exports = y | | | crime = n : democrat (124.23/0.29) | | | crime = y | | | | handicapped-infants = n : democrat (16.9/0.38) | | | |
15、handicapped-infants = y : democrat (8.99/0.02) | physician-fee-freeze = y | | immigration = n | | | education-spending = n | | | | crime = n : democrat (1.09/0) | | | | crime = y : democrat (1.01/0.01) | | | education-spending = y : republican (1.06/0.02) | | immigration = y | | | synfuels-corporati
16、on-cutback = n | | | | religious-groups-in-schools = n : republican (3.02/0.01) | | | | religious-groups-in-schools = y : republican (1.54/0.04) | | | synfuels-corporation-cutback = y : republican (1.06/0.05) el-salvador-aid = y | synfuels-corporation-cutback = n | | physician-fee-freeze = n | | | h
17、andicapped-infants = n | | | | superfund-right-to-sue = n | | | | | crime = n : democrat (1.36/0) | | | | | crime = y | | | | | | mx-missile = n : republican (1.01/0) | | | | | | mx-missile = y : democrat (1.01/0.01) | | | | superfund-right-to-sue = y : democrat (4.83/0.03) | | | handicapped-infants
18、 = y : democrat (8.42/0.02) | | physician-fee-freeze = y | | | adoption-of-the-budget-resolution = n | | | | export-administration-act-south-africa = n | | | | | mx-missile = n : republican (49.03/0) | | | | | mx-missile = y : democrat (0.11/0) | | | | export-administration-act-south-africa = y | |
19、| | | duty-free-exports = n | | | | | | mx-missile = n : republican (60.67/0) | | | | | | mx-missile = y : republican (6.21/0.15) | | | | | duty-free-exports = y | | | | | | aid-to-nicaraguan-contras = n | | | | | | | water-project-cost-sharing = n | | | | | | | | mx-missile = n : republican (3.12/0
20、) | | | | | | | | mx-missile = y : democrat (0.01/0) | | | | | | | water-project-cost-sharing = y : democrat (1.15/0.14) | | | | | | aid-to-nicaraguan-contras = y : republican (0.16/0) | | | adoption-of-the-budget-resolution = y | | | | anti-satellite-test-ban = n | | | | | immigration = n : democra
21、t (2.01/0.01) | | | | | immigration = y | | | | | | water-project-cost-sharing = n | | | | | | | mx-missile = n : republican (1.63/0) | | | | | | | mx-missile = y : republican (1.01/0.01) | | | | | | water-project-cost-sharing = y | | | | | | | superfund-right-to-sue = n : republican (0.45/0) | | |
22、| | | | superfund-right-to-sue = y : republican (1.71/0.64) | | | | anti-satellite-test-ban = y | | | | | mx-missile = n : republican (7.74/0) | | | | | mx-missile = y : republican (4.05/0.03) | synfuels-corporation-cutback = y | | adoption-of-the-budget-resolution = n | | | superfund-right-to-sue =
23、 n | | | | anti-satellite-test-ban = n | | | | | physician-fee-freeze = n : democrat (1.39/0.01) | | | | | physician-fee-freeze = y | | | | | | water-project-cost-sharing = n : republican (1.01/0) | | | | | | water-project-cost-sharing = y : democrat (1.05/0.05) | | | | anti-satellite-test-ban = y :
24、 democrat (1.13/0.01) | | | superfund-right-to-sue = y | | | | education-spending = n | | | | | physician-fee-freeze = n | | | | | | crime = n : democrat (0.09/0) | | | | | | crime = y | | | | | | | handicapped-infants = n : democrat (1.01/0.01) | | | | | | | handicapped-infants = y : democrat (1/0)
25、 | | | | | physician-fee-freeze = y | | | | | | immigration = n | | | | | | | export-administration-act-south-africa = n : democrat (0.34/0.11) | | | | | | | export-administration-act-south-africa = y | | | | | | | | crime = n : democrat (0.16/0) | | | | | | | | crime = y | | | | | | | | | mx-missil
26、e = n | | | | | | | | | | handicapped-infants = n : republican (0.29/0) | | | | | | | | | | handicapped-infants = y : republican (1.88/0.87) | | | | | | | | | mx-missile = y : democrat (0.01/0) | | | | | | immigration = y : republican (1.01/0) | | | | education-spending = y | | | | | physician-fee-f
27、reeze = n | | | | | | handicapped-infants = n : democrat (1.51/0.01) | | | | | | handicapped-infants = y : democrat (2.01/0) | | | | | physician-fee-freeze = y | | | | | | crime = n : republican (1.02/0) | | | | | | crime = y | | | | | | | export-administration-act-south-africa = n | | | | | | | | h
28、andicapped-infants = n | | | | | | | | | immigration = n | | | | | | | | | | mx-missile = n | | | | | | | | | | | water-project-cost-sharing = n : democrat (1.01/0.01) | | | | | | | | | | | water-project-cost-sharing = y : republican (1.81/0) | | | | | | | | | | mx-missile = y : democrat (0.01/0) |
29、| | | | | | | | immigration = y | | | | | | | | | | mx-missile = n : republican (2.78/0) | | | | | | | | | | mx-missile = y : democrat (0.01/0) | | | | | | | | handicapped-infants = y | | | | | | | | | mx-missile = n : republican (2/0) | | | | | | | | | mx-missile = y : democrat (0.4/0) | | | | | |
30、| export-administration-act-south-africa = y | | | | | | | | mx-missile = n : republican (8.77/0) | | | | | | | | mx-missile = y : democrat (0.02/0) | | adoption-of-the-budget-resolution = y | | | anti-satellite-test-ban = n | | | | handicapped-infants = n | | | | | crime = n : democrat (2.52/0.01)
31、| | | | | crime = y : democrat (7.65/0.07) | | | | handicapped-infants = y : democrat (10.83/0.02) | | | anti-satellite-test-ban = y | | | | physician-fee-freeze = n | | | | | handicapped-infants = n | | | | | | crime = n : democrat (2.42/0.01) | | | | | | crime = y : democrat (2.28/0.03) | | | | |
32、handicapped-infants = y : democrat (4.17/0.01) | | | | physician-fee-freeze = y | | | | | mx-missile = n : republican (2.3/0) | | | | | mx-missile = y : democrat (0.01/0) size of the tree : 143 time taken to build model: 0.01seconds = stratified cross-validation = = summary = correctly classified in
33、stances 407 93.5632 % incorrectly classified instances 28 6.4368 % kappa statistic 0.8636 mean absolute error 0.0699 root mean squared error 0.2379 relative absolute error 14.7341 % root relative squared error 48.8605 % total number of instances 435 = detailed accuracy by class = tp rate fp rate pre
34、cision recall f-measure roc area class 0.955 0.095 0.941 0.955 0.948 0.966 democrat 0.905 0.045 0.927 0.905 0.916 0.967 republican weighted avg. 0.936 0.076 0.936 0.936 0.935 0.966 = confusion matrix = a b - classified as 255 12 | a = democrat 16 152 | b = republican 3)结果分析:a)该样本数据,数据记录数435 个, 17 个属
35、性,进行了10 轮交叉验证b)随机树长143 c)正确分类共407 个,正确率达93.5632 % d)错误分类28 个,错误率6.4368 % e)测试数据的正确率较好5.分类算法 -随机树分析1)操作步骤:a)点击“ explorer”按钮,弹出“weka explorer”控制界面b)选择“ classify ”选项卡;c)点击“ choose”按钮,选择“trees”“j48 ”规则d)设置cross-validation 为 10 次e)点击左侧“ start”按钮2)执行结果:= run information = scheme:weka.classifiers.trees.j48
36、 -c 0.25 -m 2 relation: vote instances:435 attributes:17 handicapped-infants water-project-cost-sharing adoption-of-the-budget-resolution physician-fee-freeze el-salvador-aid religious-groups-in-schools anti-satellite-test-ban aid-to-nicaraguan-contras mx-missile immigration synfuels-corporation-cut
37、back education-spending superfund-right-to-sue crime duty-free-exports export-administration-act-south-africa class test mode:10-fold cross-validation = classifier model (full training set) = j48 pruned tree - physician-fee-freeze = n: democrat (253.41/3.75) physician-fee-freeze = y | synfuels-corpo
38、ration-cutback = n: republican (145.71/4.0) | synfuels-corporation-cutback = y | | mx-missile = n | | | adoption-of-the-budget-resolution = n: republican (22.61/3.32) | | | adoption-of-the-budget-resolution = y | | | | anti-satellite-test-ban = n: democrat (5.04/0.02) | | | | anti-satellite-test-ban
39、 = y: republican (2.21) | | mx-missile = y: democrat (6.03/1.03) number of leaves : 6 size of the tree : 11 time taken to build model: 0.06seconds = stratified cross-validation = = summary = correctly classified instances 419 96.3218 % incorrectly classified instances 16 3.6782 % kappa statistic 0.9
40、224 mean absolute error 0.0611 root mean squared error 0.1748 relative absolute error 12.887 % root relative squared error 35.9085 % total number of instances 435 = detailed accuracy by class = tp rate fp rate precision recall f-measure roc area class 0.97 0.048 0.97 0.97 0.97 0.971 democrat 0.952 0.03 0.952 0.952 0.952 0.971 republican weighted avg. 0.963 0.041 0.963 0.963 0.963 0.971 = confusion matrix = a b - classified as 259 8 | a = democrat 8 160 | b = republican 3)结果分析:a)该样本数据,数据记录数435 个, 17 个
