WEKA详细介绍PPT课件_第1页
WEKA详细介绍PPT课件_第2页
WEKA详细介绍PPT课件_第3页
WEKA详细介绍PPT课件_第4页
WEKA详细介绍PPT课件_第5页
已阅读5页,还剩80页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、.,1,Contributed by Yizhou Sun 2008,An Introduction to WEKA,23.06.2020,.,2,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Association Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,3,What is WEKA?,Waikato Environment for Knowledge Ana

2、lysis Its a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.,23.06.2020,.,4,Download and Install WEKA,Website: http:/www.cs.waikato.ac.nz/ml/weka/index.html Support multipl

3、e platforms (written in java): Windows, Mac OS X and Linux,23.06.2020,.,5,Main Features,49 data preprocessing tools 76 classification/regression algorithms 8 clustering algorithms 3 algorithms for finding association rules 15 attribute/subset evaluators + 10 search algorithms for feature selection,2

4、3.06.2020,.,6,Main GUI,Three graphical user interfaces “The Explorer” (exploratory data analysis) “The Experimenter” (experimental environment) “The KnowledgeFlow” (new process model inspired interface),23.06.2020,.,7,Content,What is WEKA? The Explorer: Preprocess data Classification Clustering Asso

5、ciation Rules Attribute Selection Data Visualization References and Resources,23.06.2020,.,8,Explorer: pre-processing the data,Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary Data can also be read from a URL or from an SQL database (using JDBC) Pre-processing tools in WE

6、KA are called “filters” WEKA contains filters for: Discretization, normalization, resampling, attribute selection, transforming and combining attributes, ,23.06.2020,.,9,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, n

7、on_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class present, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with

8、“flat” files,Flat file in ARFF format,23.06.2020,.,10,relation heart-disease-simplified attribute age numeric attribute sex female, male attribute chest_pain_type typ_angina, asympt, non_anginal, atyp_angina attribute cholesterol numeric attribute exercise_induced_angina no, yes attribute class pres

9、ent, not_present data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present .,WEKA only deals with “flat” files,numeric attribute,nominal attribute,23.06.2020,.,11,23.06.2020,.,12,23.06.2020,.,13,23.06.2020,.,14,23.

10、06.2020,.,15,23.06.2020,.,16,23.06.2020,.,17,23.06.2020,.,18,23.06.2020,.,19,23.06.2020,.,20,23.06.2020,.,21,23.06.2020,.,22,23.06.2020,.,23,23.06.2020,.,24,23.06.2020,.,25,23.06.2020,.,26,23.06.2020,.,27,23.06.2020,.,28,23.06.2020,.,29,23.06.2020,.,30,23.06.2020,.,31,23.06.2020,.,32,Explorer: build

11、ing “classifiers”,Classifiers in WEKA are models for predicting nominal or numeric quantities Implemented learning schemes include: Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes nets, ,23.06.2020,.,33,This follows a

12、n example of Quinlans ID3 (Playing Tennis),Decision Tree Induction: Training Dataset,23.06.2020,.,34,Output: A Decision Tree for “buys_computer”,23.06.2020,.,35,Basic algorithm (a greedy algorithm) Tree is constructed in a top-down recursive divide-and-conquer manner At start, all the training examp

13、les are at the root Attributes are categorical (if continuous-valued, they are discretized in advance) Examples are partitioned recursively based on selected attributes Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain),Algorithm for Decision Tre

14、e Induction,23.06.2020,.,36,23.06.2020,.,37,23.06.2020,.,38,23.06.2020,.,39,23.06.2020,.,40,23.06.2020,.,41,23.06.2020,.,42,23.06.2020,.,43,23.06.2020,.,44,23.06.2020,.,45,23.06.2020,.,46,23.06.2020,.,47,23.06.2020,.,48,23.06.2020,.,49,23.06.2020,.,50,23.06.2020,.,51,23.06.2020,.,52,23.06.2020,.,53,

15、23.06.2020,.,54,23.06.2020,.,55,23.06.2020,.,56,23.06.2020,.,57,23.06.2020,.,58,Explorer: clustering data,WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “t

16、rue” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution,23.06.2020,.,59,Explorer: finding associations,WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statis

17、tical dependencies between groups of attributes: milk, butter bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence,23.06.2020,.,60,23.06.2020,.,61,23.06.2020,.,62,23.06.2020,.,63,23.06.2020,.,64,23.06.2020,.,

18、65,Explorer: attribute selection,Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correl

19、ation-based, wrapper, information gain, chi-squared, Very flexible: WEKA allows (almost) arbitrary combinations of these two,23.06.2020,.,66,23.06.2020,.,67,23.06.2020,.,68,23.06.2020,.,69,23.06.2020,.,70,23.06.2020,.,71,23.06.2020,.,72,23.06.2020,.,73,23.06.2020,.,74,Explorer: data visualization,Vi

20、sualization very useful in practice: e.g. helps to determine difficulty of the learning problem WEKA can visualize single attributes (1-d) and pairs of attributes (2-d) To do: rotating 3-d visualizations (Xgobi-style) Color-coded class values “Jitter” option to deal with nominal attributes (and to detect “hidden” data points) “Zoom-in” function,23.06.2020

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论