




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Clementine7.0软件操作(中),第七讲,主讲教师:沈浩 北京广播学院新闻传播学院 副教授 北京广播学院调查统计研究所 副所长 IPSOS(中国)市场研究有限公司 首席技术顾问,Graph Nodes,The Graphs palette contains the following nodes: Plot Multiplot Distribution Histogram Collection Web Evaluation,Graph with size overlay,Graph with panel overlay,Graph with color overlay,Graph wi
2、th color and transparency overlays,3-D Graphs,Animation,Using Graphs,Plot Node,Multiplot Node,Distribution Node,Histogram Node,Collection Node,Web Node,Creating a Web Summary,Evaluation Chart Node,Cumulative gains charts always start at 0% and end at 100% You go from left to right. For a good model,
3、 the gains chart will rise steeply toward 100% and then level off. A model that provides no information will follow the diagonal from lower left to upper right (shown in the chart if Include baseline is selected).,Gains charts.,Cumulative lift charts tend to start above 1.0 and gradually descend unt
4、il they reach 1.0 you go from left to right. The right edge of the chart represents the entire data set The ratio of hits in cumulative quantiles to hits in data is 1.0. For a good model, lift should start well above 1.0 on the left, remain on a high plateau as you move to the right, and then trail
5、off sharply toward 1.0 on the right side of the chart For a model that provides no information, the line will hover around 1.0 for the entire graph. (If Include baseline is selected, a horizontal line at 1.0 is shown in the chart for reference.),Lift charts.,Cumulative response charts tend to be ver
6、y similar to lift charts except for the scaling. Response charts usually start near 100% and gradually descend until they reach the overall response rate (total hits / total records) on the right edge of the chart. For a good model, the line will start near or at 100% on the left, remain on a high p
7、lateau as you move to the right, and then trail off sharply toward the overall response rate on the right side of the chart. For a model that provides no information, the line will hover around the overall response rate for the entire graph. (If Include baseline is selected, a horizontal line at the
8、 overall response rate is shown in the chart for reference.),Response charts.,Cumulative profit charts show the sum of profits as you increase the size of the selected sample, moving from left to right. Profit charts usually start near zero, increase steadily as you move to the right until they reac
9、h a peak or plateau in the middle, and then decrease toward the right edge of the chart. For a good model, profits will show a well-defined peak somewhere in the middle of the chart. For a model that provides no information, the line will be relatively straight and may be increasing, decreasing, or
10、level depending on the cost/revenue structure that applies.,Profit charts.,Profits=Revenue-Cost,Cumulative ROI (return on investment投资回报) charts tend to be similar to response charts and lift charts except for the scaling. ROI charts usually start above 0% and gradually descend until they reach the
11、overall ROI for the entire data set (which can be negative). For a good model, the line should start well above 0%, remain on a high plateau as you move to the right, and then trail off rather sharply toward the overall ROI on the right side of the chart. For a model that provides no information, th
12、e line should hover around the overall ROI value.,ROI charts.,Modeling Nodes,The Modeling palette contains the following nodes: Neural Net神经网络 C5.0决策树 Kohonen神经聚类 Linear Regression线形回归 Generalized Rule Induction (GRI)一般规则侦测 Apriori神经规则 K-Means快速聚类 Logistic Regression罗辑斯蒂克回归 Factor Analysis/PCA因子和主成分
13、分析 TwoStep Cluster两阶段聚类 Classification and Regression (C&R) Trees分类和回归决策树 Sequence Detection序列分析,Modeling,Modeling,Target,Out,in,in,Modeling,in,in,Gen-Outcome,Neural Net Node,Requirements. There are no restrictions on field types. Neural Net nodes can handle numeric, symbolic, or flag inputs and out
14、puts. The Neural Net node expects one or more fields with direction In and one or more fields with direction Out. Fields set to Both or None are ignored. Field types must be fully instantiated when the node is executed. Strengths. Neural networks are powerful general function estimators. They usuall
15、y perform prediction tasks at least as well as other techniques and sometimes perform significantly better. They also require minimal statistical or mathematical knowledge to train or apply. Clementine incorporates several features to avoid some of the common pitfalls of neural networks, including s
16、ensitivity analysis to aid in interpretation of the network, pruning and validation to prevent overtraining, and dynamic networks to automatically find an appropriate network architecture.,A new weight is derived by taking the old weight, applying an adjustment based on a function of the prediction
17、error (represented here by dj). (the nonlinear function applied to the result of the combination of the weights and inputs) The momentum term (D) serves to encourage the weight change to maintain the same direction as the last weight change. It and the learning rate (K) are control parameters that c
18、an be modified by experienced neural network practitioners to fine-tune the performance of backpropagation neural networks.,Wji is the weight connecting neuron i to neuron j, t is the trial number, K is the learning rate (a value set between 0 and 1), dj is the error gradient at node j Oi is the act
19、ivation level of a node D is a momentum term (a value set between 0 and 1).,Neural Net Node,Neural Net Node,A Neural Network Example: Predicting Credit Risk,Because the target variable is categorical, the analysis will substitute three dummy coded (0,1) fields for the single three-category field. Si
20、milarly, this will be done for the marital status field. Such adjustment is made automatically within Clementine, based on the fields type.,We will use a neural network to predict the credit risk category into which individuals should be placed.,credit risk,“good risk” “bad risk, but profitable” “ba
21、d risk with loss”.,predictors,outcome field,marital status income number of store credit cards Age number of credit cards number of loans number of children Gender Mortgage抵押 whether salary is weekly or monthly,A Neural Network Stream一齐动手!,D:mydataTrain.txt D:mydataTest.txt,部分结论,Kohonen Node,The Koh
22、onen node is used to create and train a special type of neural network called a Kohonen network, a knet, or a self-organizing map. This type of network can be used to cluster the data set into distinct groups, when you dont know what those groups are at the beginning. Unlike most learning methods in
23、 Clementine. Kohonen networks do not use a target field. This type of learning, with no target field, is called unsupervised learning. Instead of trying to predict an outcome, Kohonen nets try to uncover patterns in the set of input fields. Records are grouped so that records within a group or clust
24、er tend to be similar to each other, and records in different groups are dissimilar.,income,Age,Gender,marital status,number of children,Seg11,Segij,Kohonen network consists of an input layer of units and a two-dimensional output grid of processing units. During training, each unit competes with all
25、 of the others to “win” each record. When a unit wins a record, its weights are adjusted to better match the pattern of predictor values for that record. As training proceeds, the weights on the grid units are adjusted so that they form a two-dimensional “map” of the clusters. (Hence the term self-o
26、rganizing map.) Usually, a Kohonen net will end up with a few units that summarize many observations (strong units), and several units that dont really correspond to any of the observations (weak units). The strong units represent probable cluster centers. Another use of Kohonen networks is in dimen
27、sion reduction. The spatial characteristic of the two-dimensional grid provides a mapping from the k original predictors to two derived features that preserve the similarity relationships in the original predictors. In some cases, this can give you the same kind of benefit as factor analysis or PCA.
28、,Kohonen Node,Kohonen Node一齐动手!,Kohonen Stream,部分结论,聚类结果,C5.0 Node,Requirements. To train a C5.0 model, you need one or more In fields and one or more symbolic Out field(s). Fields set to Both or None are ignored. Fields used in the model must have their types fully instantiated. Strengths. C5.0 mod
29、els are quite robust in the presence of problems such as missing data and large numbers of input fields. They usually do not require long training times to estimate. In addition, C5.0 models tend to be easier to understand than some other model types, since the rules derived from the model have a ve
30、ry straightforward interpretation. C5.0 also offers the powerful boosting method to increase accuracy of classification.,C5.0 基本原理,This node uses the C5.0 algorithm to build either a decision tree or a ruleset. A C5.0 model works by splitting the sample based on the field that provides the maximum information gain. Each subsample defined by the first split is then split again, usually based on a different field, and the process repeats until the subsamples cannot be split any further. F
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 辽宁生态工程职业学院《健身评价与运动处方》2023-2024学年第二学期期末试卷
- 云南大学《安装工程计量》2023-2024学年第二学期期末试卷
- 徐州幼儿师范高等专科学校《数据结构实验》2023-2024学年第二学期期末试卷
- 信阳艺术职业学院《给排水工程及应用》2023-2024学年第二学期期末试卷
- 武汉大学《ICF理论与方法》2023-2024学年第二学期期末试卷
- 工业区河道污染治理与水质提升
- 工业互联网平台建设及技术应用研究
- 工业互联网与智能终端的深度融合
- 工业互联网在远程监控中的应用
- 工业产品创新的机器学习支持系统
- 燕罗智能网联汽车产业园建筑方案设计
- 特许经营合作合同
- 人教版九年级物理 14.3能量的转化和守恒(学习、上课课件)
- 江苏省徐州市贾汪区2023-2024学年七年级上学期期中考试数学试卷(含解析)
- 《港口粉尘在线监测系统建设技术规范(征求意见稿)》编制说明
- 品质巡检个人工作计划
- 医院采购委员会管理制度
- 设备管道 防腐保温施工方案
- DZ∕T 0214-2020 矿产地质勘查规范 铜、铅、锌、银、镍、钼(正式版)
- 校车安全行车记录表
- QCSG1204009-2015电力监控系统安全防护技术规范
评论
0/150
提交评论