机器学习课件4-1_第1页
机器学习课件4-1_第2页
机器学习课件4-1_第3页
机器学习课件4-1_第4页
机器学习课件4-1_第5页
已阅读5页,还剩13页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、1Bayesian Learning:Nave Bayes and Bayes Nets朴素贝叶斯和贝叶斯网络李云2Axioms of Probability TheoryAll probabilities between 0 and 1True proposition(真命题) has probability 1, false has probability 0. P(true) = 1 P(false) = 0.The probability of disjunction is:AB3Conditional Probability P(A | B) is the probability o

2、f A given BAssumes that B is all and only information known.Defined by:AB4IndependenceA and B are independent iff:Therefore, if A and B are independent:These two constraints are logically equivalent5Joint Distribution联合概率分布The joint probability distribution for a set of random variables, X1,Xn gives

3、 the probability of every combination of values (an n-dimensional array with vn values if all variables are discrete with v values, all vn values must sum to 1): P(X1,Xn)指定了变量集合中每个可能取值的概率。The probability of all possible conjunctions (assignments of values to some subset of variables) can be calculat

4、ed by summing the appropriate subset of values from the joint distribution.Therefore, all conditional probabilities can also be calculated.circlesquarered0.200.02blue0.020.01circlesquarered0.050.30blue0.200.20positivenegative66贝叶斯理论Bayesian Theorem给定训练数据 X, 假设 H的后验概率 P(H|X)满足贝叶斯理论通俗地说,这可以写成posterior

5、i = likelihood * prior/evidence预测X属于类别C2当且仅当概率P(C2|X)是所有 P(Ck|X) for all the k classes最大的:极大后验概率MAP实际困难:需要许多可能性的初步知识,计算成本显著7Nave Bayesian ClassifierD为训练数据集(包含类别标签), 并且每个元组表示为一个n-维的属性向量X = (x1, x2, , xn)假定有 m 个类别 C1, C2, , Cm.分类就是推导最大的后验概率, i.e., the maximal P(Ci|X)可以由贝叶斯理论计算由于对所有类P(X)是常量,只需要最大化似然度8朴

6、素贝叶斯分类器的推导一个简单假定: 属性是条件独立的 (i.e., 属性间没有依赖关系):这样极大地减少了计算代价: 只需要统计类的分布若Ak 是分类属性P(xk|Ci) = Ci 类中Ak 取值为xk 的元组数/|Ci, D| (类Ci 的大小)若Ak 是连续值, P(xk|Ci) 通常基于均值 标准差 的高斯分布计算P(xk|Ci)=99朴素贝叶斯分类: 训练数据集两个类别: puter = yes puter = no数据样本 X = (age =30, e = medium,Student = yesCredit_rating = Fair)1010Nave Bayesian Clas

7、sifier: 例子P(Ci): P( puter = “yes”) = 9/14 = 0.643 P( puter = “no”) = 5/14= 0.357Compute P(X|Ci) for each class P(age = “=30” | puter = “yes”) = 2/9 = 0.222 P(age = “= 30” | puter = “no”) = 3/5 = 0.6 P( e = “medium” | puter = “yes”) = 4/9 = 0.444 P( e = “medium” | puter = “no”) = 2/5 = 0.4 P(student

8、= “yes” | puter = “yes) = 6/9 = 0.667 P(student = “yes” | puter = “no”) = 1/5 = 0.2 P(credit_rating = “fair” | puter = “yes”) = 6/9 = 0.667 P(credit_rating = “fair” | puter = “no”) = 2/5 = 0.4 11Nave Bayesian Classifier: 例子X = (age = 30 , e = medium, student = yes, credit_rating = fair) P(X|Ci) : P(

9、X| puter = “yes”) = 0.222 * 0.444 * 0.667 * 0.667 = 0.044 P(X| puter = “no”) = 0.6 * 0.4 * 0.2 * 0.4 = 0.019P(X|Ci)*P(Ci) : P(X| puter = “yes”) * P( puter = “yes”) = 0.028 P(X| puter = “no”) * P( puter = “no”) = 0.007Therefore, X belongs to class (“ puter = yes”)1212避免零概率问题Normally, probabilities ar

10、e estimated based on observed frequencies in the training data.朴素贝叶斯要求每个条件概率非零. 然而,预测的概率可能为零Ex. 假定有1000 元组, e=low (0), e= medium (990), and e = high (10)Use Laplacian correction校准 (or Laplacian estimator估计法)Adding 1 to each caseProb( e = low) = 1/1003Prob( e = medium) = 991/1003Prob( e = high) = 11/

11、1003校准的 “corrected” 概率估计很接近未校准的1313贝叶斯分类: Why?一个统计学分类器: 执行概率预测, i.e.,预测类成员的概率基础: 基于贝叶斯理论 Tends to work well despite strong assumption of conditional independencePerformance: 一个简单的贝叶斯分类器, 朴素贝叶斯分类器, 可以与决策树和经过挑选的神经网络分类器相媲美增量:每次训练的样本可以逐步增加/减少一个假设是正确的可能性先验知识可与观测数据相结合Standard:即使贝叶斯方法是难以计算的, 最优决策制定提供标准(其他方

12、法可以衡量)14Comments on Nave BayesAlthough it does not produce accurate probability estimates when its independence assumptions are violated, it may still pick the correct maximum-probability class in many cases.Able to learn conjunctive concepts in any caseDoes not perform any search of the hypothesis

13、space. Directly constructs a hypothesis from parameter estimates that are easily calculated from the training data.Strong biasNot guarantee consistency with training data.Typically handles noise well since it does not even focus on completely fitting the training data.1515Nave Bayesian Classifier:评论

14、2 Advantages Easy to implement Good results obtained in most of the casesDisadvantagesAssumption: 类条件独立性, 损失精度实际中, 变量间存在依赖 E.g.,医院:患者:简介:年龄,家族病史等症状:发烧,咳嗽等疾病:肺癌,糖尿病等 Dependencies among these cannot be modeled by Nave Bayesian ClassifierHow to deal with these dependencies? Bayesian Belief Networks16贝叶

15、斯信念网络Bayesian belief networks (又称为 Bayesian networks, probabilistic networks): 变量的一个子集上的类条件独立。 (有向无环) 因果关系的图模型表示变量间的依赖关系给出了一个联合概率分布XYZP Nodes: 随机变量 Links: 依赖关系 X,Y 是Z的双亲, Y is the parent of P Z 和 P间没有依赖关系 没有环17贝叶斯信念网络: An ExampleFamilyHistory (FH)LungCancer(LC)PositiveXRaySmoker (S)肺气肿 呼吸困难 LCLC(FH, S)(FH, S)(FH, S)(FH, S)0.80.20.50.50.70.30.10.9CPT: Conditional Probability Table描述该变量在给定其前驱时的概率分布for variable LungCancer:显示父母的每个可能组合的条件概率从CPT推导X的特定值的概率1818训练贝叶斯网络:几种方案Scenario 1:给定网络结构和所有变量观察:只计算CPTScenario 2: 网络结构已知, 某些变量隐藏: 梯度下降法(贪心爬山), i.e., 沿着准则函数的最速下降方向搜索解权重初

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论