data:image/s3,"s3://crabby-images/2eb88/2eb8892531814e27eddd4cf2c94c8e5fe339d7ea" alt="WEKA详细介绍PPT课件_第1页"
data:image/s3,"s3://crabby-images/29832/29832c937bc468696d293858db1b43752f2d2390" alt="WEKA详细介绍PPT课件_第2页"
data:image/s3,"s3://crabby-images/3b8f4/3b8f4ee4059e2fabfdf842dc042c75ca7d042b22" alt="WEKA详细介绍PPT课件_第3页"
data:image/s3,"s3://crabby-images/c4ff2/c4ff220515de4563b645a348075abe91163dd1b9" alt="WEKA详细介绍PPT课件_第4页"
data:image/s3,"s3://crabby-images/80844/80844180431ad56ef6cd18e56623fc9c3b0ef4e0" alt="WEKA详细介绍PPT课件_第5页"
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、.,1,Contributed by Yizhou Sun 2008,An Introduction to WEKA,23.06.2020,.,2,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,3,What is WEKA?,Waikato Environment for Knowledge Ana
2、lysis Its a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.,23.06.2020,.,4,Download and Install WEKA,Website: http:/www.cs.waikato.ac.nz/ml/weka/index.html Support multipl
3、e platforms (written in java): Windows, Mac OS X and Linux,23.06.2020,.,5,Main Features,49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection,2
4、3.06.2020,.,6,Main GUI,Three graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface),23.06.2020,.,7,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Asso
5、ciation Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,8,Explorer: pre-processing the data,Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WE
6、KA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, ,23.06.2020,.,9,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, n
7、on_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class present, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with
8、“flat” files,Flat file in ARFF format,23.06.2020,.,10,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, non_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class pres
9、ent, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with “flat” files,numeric attribute,nominal attribute,23.06.2020,.,11,23.06.2020,.,12,23.06.2020,.,13,23.06.2020,.,14,23.
10、06.2020,.,15,23.06.2020,.,16,23.06.2020,.,17,23.06.2020,.,18,23.06.2020,.,19,23.06.2020,.,20,23.06.2020,.,21,23.06.2020,.,22,23.06.2020,.,23,23.06.2020,.,24,23.06.2020,.,25,23.06.2020,.,26,23.06.2020,.,27,23.06.2020,.,28,23.06.2020,.,29,23.06.2020,.,30,23.06.2020,.,31,23.06.2020,.,32,Explorer: build
11、ing “classifiers”,Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, ,23.06.2020,.,33,This follows a
12、n example of Quinlans ID3 (Playing Tennis),Decision Tree Induction: Training Dataset,23.06.2020,.,34,Output: A Decision Tree for “buys_computer”,23.06.2020,.,35,Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examp
13、les are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain),Algorithm for Decision Tre
14、e Induction,23.06.2020,.,36,23.06.2020,.,37,23.06.2020,.,38,23.06.2020,.,39,23.06.2020,.,40,23.06.2020,.,41,23.06.2020,.,42,23.06.2020,.,43,23.06.2020,.,44,23.06.2020,.,45,23.06.2020,.,46,23.06.2020,.,47,23.06.2020,.,48,23.06.2020,.,49,23.06.2020,.,50,23.06.2020,.,51,23.06.2020,.,52,23.06.2020,.,53,
15、23.06.2020,.,54,23.06.2020,.,55,23.06.2020,.,56,23.06.2020,.,57,23.06.2020,.,58,Explorer: clustering data,WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “t
16、rue” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution,23.06.2020,.,59,Explorer: finding associations,WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statis
17、tical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence,23.06.2020,.,60,23.06.2020,.,61,23.06.2020,.,62,23.06.2020,.,63,23.06.2020,.,64,23.06.2020,.,
18、65,Explorer: attribute selection,Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correl
19、ation-based, wrapper, information gain, chi-squared, Very flexible: WEKA allows (almost) arbitrary combinations of these two,23.06.2020,.,66,23.06.2020,.,67,23.06.2020,.,68,23.06.2020,.,69,23.06.2020,.,70,23.06.2020,.,71,23.06.2020,.,72,23.06.2020,.,73,23.06.2020,.,74,Explorer: data visualization,Vi
20、sualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function,23.06.2020
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 企业人力资源管理师四级练习题库及参考答案
- 环境监测测试题(附参考答案)
- 2025年广西科技职业学院单招职业适应性测试题库附答案
- 2024云南省曲靖市陆良县城乡公交服务有限公司招聘(17人)笔试参考题库附带答案详解
- 机器学习原理与应用电子教案 4图像处理基础
- 第13课《唐诗五首-黄鹤楼、渡荆门送别》教学设计 2024-2025学年统编版语文八年级上册
- 2025年海口经济学院单招职业倾向性测试题库汇编
- 18《浪淘沙(其一)》教学设计-2024-2025学年统编版语文六年级上册
- 2025至2030年中国核相仪数据监测研究报告
- 2025年河北青年管理干部学院单招职业适应性测试题库及答案1套
- 市政质量员继续教育考试题库集(含答案)
- 售后工程师述职报告
- 《公司法完整版》课件2024
- 2024年下半年信息系统项目管理师真题及答案
- 海康威视电力行业系统解决方案
- 2024-2030年中国街舞培训行业发展趋势及竞争格局分析报告
- 期末练习卷(模拟试题)-2024-2025学年 一年级上册数学人教版
- 白血病合并感染
- GB/T 18601-2024天然花岗石建筑板材
- 有机肥配施氮肥对玉米根系生长、氮素利用及产量和品质的影响
- 2024年山西省中考语文试卷
评论
0/150
提交评论