交叉验证我的总结_第1页
交叉验证我的总结_第2页
交叉验证我的总结_第3页
交叉验证我的总结_第4页
交叉验证我的总结_第5页
已阅读5页,还剩20页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、先说下我和邵是云的聊天情况:她的意思其实一开始所有样本也是先分为了两个部分,一个大的部分是训练集,一个小的部分是测试集,然后只是在训练集里面分为常规训练集和一个效验集,且是交叉验证的方式,都全部交叉验证搞完了,再最后单独测试那个小部分的测试集样本?就是你说的training accuracy(常规训练集), cross validation rate(效验集) and test accuracy(测试集),她说这是标准的方式,而如果全体数据用来训练和交叉验证其实就是把所有的样本全部分为了训练集和效验集,所以就没有test accuracy一说。常用的精度测试方法有交叉验证,例如10倍交叉验证(

2、10-fold cross validation),将数据集分成十分,轮流将其中9份做训练1份做测试,10次的结果的均值作为对算法精度的估计,一般还需要进行多次10倍交叉验证求均值,例如10次10倍交叉验证,更精确一点。当训练样本数目过少,采用“交叉验证法”(cross validation).交叉验证法分两种1:K重交叉验证法:该方法是最为普遍的计算推广误差的方法之一。其过程为:将训练样本集随机分为K个集合,通常分为K等份,对其中的K-1个集合进行训练,得到一个决策函数,并用决策函数对剩下的一个集合进行样本测试。该过程重复K次,取K次过程中的测试错误的平均值作为推广误差。2:留一法:该方法可

3、以说是K重交叉验证法的极端情况,即K=L,L为整个训练样本集的大小。该过程为 :对于第i个训练样本,将其取出,对剩下L-1个样本进行训练,得到决策函数,并用其测试第i个训练样本,该过程重复L次,用此方法求出的误差对于实际中的测试误差来说几乎是无偏的。(注意当样本过少,即使交叉验证效果也不会理想,一般样本应在100以上.)k-fold validation中文就是"k-折交叉验证(确认)"其中的k是用户自己定的但它必须比原始的训练集中的元素个数n要小,即k<=n.著名的loo(leave one out,留一法)就是k-fold 

4、validation的一个特例即loo中的k=n.k-fold validation经常被用来训练NN,SVM等来确定一个最优的参数它的基本思想就是将原始的训练集分成两部分:训练集2(为了与原始训练集相区别,本处称之为训练集2)与验证集从原始训练集中选择n/k个元素组成验证集剩下的(k-1)*n/k个元素用来做训练集2然后用训练集2来训练NN,SVM等,用验证集来验证所得分类器(此处以分类为例,对回归应该也一样)的错误码率然后再次选择另外n/k个元素组成验证集剩下的做为训练集2循环,直到所有元素n/k个元素全部被选择一遍为止比较以上每次循环所得分类器的错误率把所得错误率最低的那个参数

5、认为是最优的参数推荐精选-fold cross-validation不是什么参数都可以调的它可以调的只是离散的参数,比如网络hidden node的个数对于连续的参数没法调网络的权值是通过learning algorithm来调节的只是用validation set来控制是否over train跟k-fold cross-validation没有什么关系除此之外k-fold cross-validation主要是干什么: 根据一个样本集k次validation之后的误差的平均值来估计一个已经训练好的网络的泛化误差结

6、构风险最小化VC 维在有限的训练样本情况下,当样本数 n 固定时,此时学习机器的 VC 维越高学习机器的复杂性越高。VC 维反映了函数集的学习能力,VC 维越大则学习机器越复杂(容量越大)。        所谓的结构风险最小化就是在保证分类精度(经验风险)的同时,降低学习机器的 VC 维,可以使学习机器在整个样本集上的期望风险得到控制。       推广的界(经验风险和实际风险之间的关系,注意引入这个原因是什么?因为训练误差再小也就是在这个训练集合上,实际的推广能力

7、不行就会引起过拟合问题还。所以说要引入置信范围也就是经验误差和实际期望误差之间的关系):    期望误差R() Remp ()+ (n/h)注意Remp ()是经验误差也就是训练误差(线性中使得所有的都训练正确),(n/h)是置信范围,它是和样本数和VC维有关的。上式中置信范围 随n/h增加,单调下降。即当n/h较小时,置信范围 较大,用经验风险近似实际风险就存在较大的误差,因此,用采用经验风险最小化准则,取得的最优解可能具有较差的推广性;如果样本数较多,n/h较大,则置信范围就会很小,采用经验风险最小化准则,求得的最优解就接近实际的最优解。可知:影响期望风险上界

8、的因子有两个方面:首先是训练集的规模 n,其次是 VC 维 h。可见,在保证分类精度(经验风险)的同时,降低学习机器的 VC 维,可以使学习机器在整个样本集上的期望风险得到控制,这就是结构风险最小化(Structure Risk Minimization,简称 SRM)的由来。     在有限的训练样本情况下,当样本数 n 固定时,此时学习机器的 VC 维越高(学习机器的复杂性越高),则置信范围就越大,此时,真实风险与经验风险之间的差别就越大,这就是为什么会出现过学习现象的原因。机器学习过程不但要使经验风险最小,还要使其 VC 维尽量小,以缩小置信范围,

9、才能取得较小的实际风险,即对未来样本有较好的推广性,它与学习机器的 VC 维及训练样本数有关。        线性可分的问题就是满足最优分类面的面要求分类面不但能将两类样本正确分开(训练错误率为 0),而且要使两类的分类间隔最大(这个是怎么回事呢?原来是有根据的,这个让俺郁闷了好久呢。在 间隔下,超平面集合的 VC 维 h 满足下面关系:    h = f (1/*) 其中, f().是单调增函数,即 h 与的平方成反比关系。因此,当训练样本给定时,分类间隔越大,则对应的分类超平面集合的 VC

10、维就越小。)。根据结构风险最小化原则,前者是保证经验风险(经验风险和期望风险依赖于学习机器函数族的选择)最小,推荐精选而后者使分类间隔最大,导致 VC 维最小,实际上就是使推广性的界中的置信范围最小,从而达到使真实风险最小。注意:置信范围大说明真实风险和经验风险的差别较大。        解释到这里了,终于有点眉目了,哦原来就是这么回事啊,真是的。总结一下就是训练样本在线性可分的情况下,全部样本能被正确地分类(咦这个不就是传说中的 yi*(w*xi+b))>=1的条件吗),即经验风险Remp 为 0 的前提下,通过对分

11、类间隔最大化(咦,这个就是(w)(1/2)*w*w嘛),使分类器获得最好的推广性能。        那么解释完线性可分了,我们知道其实很多时候是线性不可分的啊,那么有什么区别没有啊?废话区别当然会有啦,嘿嘿那么什么是本质的区别啊?本质的区别就是不知道是否线性可分但是允许有错分的样本存在(这个咋回事还是没明白hoho)但是正是由于允许存在错分样本,此时的软间隔分类超平面表示在剔除那些错分样本后最大分类间隔的超平面。这里就出现了新词松驰因子,干吗用滴?就是用来控制错分样本的啊。这样的话经验风险就要跟松驰因子联系在一起了。而C就是

12、松驰因子前面的系数,C>0 是一个自定义的惩罚因子,它控制对错分样本惩罚的程度,用来控制样本偏差与机器推广能力之间的折衷。c越小,惩罚越小,那么训练误差就越大,使得结构风险也变大,而C 越大呢,惩罚就越大,对错分样本的约束程度就越大,但是这样会使得第二项置信范围的权重变大那么分类间隔的权重就相对变小了,系统的泛化能力就变差了。所以选择合适的C还是很有必要的。选择核函数。核函数有很多种,如线性核、多项式核、Sigmoid 核和 RBF(Radial Basis function)核。本文选定 RBF 核为 SVM 的核函数(RBF 核K(x, y) = exp( | x y |的平方),

13、> 0)。因为RBF 核可以将样本映射到一个更高维的空间,可以处理当类标签(Class Labels)和特征之间的关系是非线性时的样例。Keerthi 等25证明了一个有惩罚参数C 的线性核同有参数(C, )(其中C 为惩罚因子, 为核参数)的 RBF 核具有相同的性能。对某些参数,Sigmoid核同 RBF 核具有相似的性能26。另外,RBF 核与多项式核相比具有参数少的优点。因为参数的个数直接影响到模型选择的复杂性。非常重要的一点是0< Kij 1与多项式核相反,核值可能趋向无限(xi xj + r >1)或者0 < xi xj + r <1,跨度非常大。而且

14、,必须注意的是Sigmoid 核在某些参数下是不正确的(例如,没有两个向量的内积)。 (4)用交叉验证找到最好的参数 C 和 。使用 RBF 核时,要考虑两个参数 C 和 。因为参数的选择并没有一定的先验知识,必须做某种类型的模型选择(参数搜索)。目的是确定好的(C,)使得分类器能正确的预测未知数据(即测试集数据),有较高的分类精确率推荐精选。值得注意的是得到高的训练正确率即是分类器预测类标签已知的训练数据的正确率)不能保证在测试集上具有高的预测精度。因此,通常采用交叉验证方法提高预测精度。k 折交叉验证(k-fold cross validation) 是将训练集合分成 k 个大小相同的子集

15、。其中一个子集用于测试,其它 k-1 个子集用于对分类器进行训练。这样,整个训练集中的每一个子集被预测一次,交叉验证的正确率是 k次正确分类数据百分比的平均值。它可以防止过拟合的问题。 可以归纳为以下几个过程(顺序的):1. 收集数据,相关性分析(p卡方检验),特征选择(主成份分析)。 2. 归一化数据。就是根据实际要求,将数据的取值范围转化为统一的区间如a,b,a,b为整数。方法参考:http:/slt- 3. 利用抽样技术将数据集分为训练集和测试集。抽样技术有分层抽样,简单抽样(等概率抽样) 4. 将数据转化为软件(接口)所支持的格式。就libsvm(c,java)来说,我们可以使用For

16、matDataLibsvm.xls将数据转化为libsvmm所要求 的格式。参考:http:/slt- 5. 选择核函数,可以优先考虑rbf。 6. 对训练集利用交叉验证法选择最好的参数C和r(rbf核函数中的参数gama)。可以通过网格法寻找出最优的参数,注意一次交叉验证得到一个参数对所对应的模型精度,网格法目的就是找到使得模型精度达到对高的参数对(这里的参数对可能不止两个,有可能也有其他的),可以使用一些启发式的搜索来降低复杂度,虽然这个方法笨了点,但是它能得到很稳定的搜索结果。需要提到的这里在对训练集进行分割的时候涉及到抽样,一个较好的方法就是分层抽样。从这步可以看出其实 CrossVa

17、lidation是一种评估算法的方法。 7. 用6中得到的参数对在整个训练集合上进行训练,从而得出模型。 8. 利用测试集测试模型,得到精度。这个精度可以认为是模型最终的精度。当然有人会担心3步中抽样会有一定的误差,导致8得到的精度不一定是最好的,因此可以重复38得到多个模型的精度,然后选择最好的一个精度最为模型的精度(或者求所有精度的均值做为模型精度)。如何使用交叉验证(cross-validatation)· 写于7月28日 · 1条评论 推荐精选如何使用Cross-Validation写这份文件,最主要的目的是介绍如何正确的使用cross-validation,并举例

18、一些常犯的错误。假设您对pattern recognition已经有基础的认识,希望这份文件对您论文中的实验内容能有所帮助。 在pattern recognition与machine learning的相关研究中,经常会将dataset分为training跟test这两个subsets,前者用以建立model,后者则用来评估该model对未知样本进行预测时的精确度,正规的说法是generalization ability。在往下叙述之前,这边就必须点出一个极为重要的观念:只有training data才可以用在model的训练过程中,test data则必须在model完成之后才被用来评估mo

19、del优劣的依据。怎么将完整的dataset分为training set与test set也是学问,必须遵守两个要点:1. training set中样本数量必须够多,一般至少大于总样本数的50%。 2. 两组子集必须从完整集合中均匀取样。其中第2点特别重要,均匀取样的目的是希望减少training/test set与完整集合之间的偏差(bias),但却也不易做到。一般的作法是随机取样,当样本数量足够时,便可达到均匀取样的效果。然而随机也正是此作法的盲点,也是经常是可以在数据上做手脚的地方。举例来说,当辨识率不理想时,便重新取样一组training set与test set,直到test se

20、t的辨识率满意为止,但严格来说这样便算是作弊了。Cross-validation正是为了有效的估测generalization error所设计的实验方法,可以细分为double cross-validation、k-fold cross-validation与leave-one-out cross-validation。Double cross-validation也称2-fold cross-validation(2-CV),作法是将dataset分成两个相等大小的subsets,进行两回合的分类器训练。在第一回合中,一个subset作为training set,另一个便作为test se

21、t;在第二回合中,则将training set与test set对换后,再次训练分类器,而其中我们比较关心的是两次test sets的辨识率。不过在实务上2-CV并不常用,主要原因是training set样本数太少,通常不足以代表母体样本的分布,导致test阶段辨识率容易出现明显落差。此外,2-CV中分subset的变异度大,往往无法达到实验过程必须可以被复制的要求。K-fold cross-validation (k-CV)则是double cross-validation的延伸,作法是将dataset切成k个大小相等的subsets,每个subset皆分别作为一次test set,其余样

22、本则作为training set,推荐精选因此一次k-CV的实验共需要建立k个models,并计算k次test sets的平均辨识率。在实作上,k要够大才能使各回合中的training set样本数够多,一般而言k=10算是相当足够了。最后是leave-one-out cross-validation (LOOCV),假设dataset中有n个样本,那LOOCV也就是n-CV,意思是每个样本单独作为一次test set,剩余n-1个样本则做为training set,故一次LOOCV共要建立n个models。相较于前面介绍的k-CV,LOOCV有两个明显的优点:· 每一回合中几乎所有

23、的样本皆用于训练model,因此最接近母体样本的分布,估测所得的generalization error比较可靠。 · 实验过程中没有随机因素会影响实验数据,确保实验过程是可以被复制的。但LOOCV的缺点则是计算成本高,因为需要建立的models数量与总样本数量相同,当总样本数量相当多时,LOOCV在实作上便有困难,除非每次训练model的速度很快,或是可以用平行化计算减少计算所需的时间。使用Cross-Validation时常犯的错误由于实验室许多研究都有用到evolutionary algorithms(EA)与classifiers,所使用的fitness function中通

24、常都有用到classifier的辨识率,然而把cross-validation用错的案例还不少。前面说过,只有training data才可以用于model的建构,所以只有training data的辨识率才可以用在fitness function中。而EA是训练过程用来调整model最佳参数的方法,所以只有在EA结束演化后,model参数已经固定了,这时候才可以使用test data。那EA跟cross-validation要如何搭配呢?Cross-validation的本质是用来估测(estimate)某个classification method对一组dataset的generaliza

25、tion error,不是用来设计classifier的方法,所以cross-validation不能用在EA的fitness function中,因为与fitness function有关的样本都属于training set,那试问哪些样本才是test set呢?如果某个fitness function中用了cross-validation的training或test辨识率,那么这样的实验方法已经不能称为cross-validation了。EA与k-CV正确的搭配方法,是将dataset分成k等份的subsets后,每次取1份subset作为test set,其余k-1份作为training

26、 set,并且将该组training set套用到EA的fitness function计算中(至于该training set如何进一步利用则没有限制)。因此,正确的k-CV 会进行共k次的EA演化,建立k个classifiers。而k-CV的test辨识率,则是k组test sets对应到EA训练所得的k个classifiers辨识率之平均值。推荐精选Cross Validation,中文意思是交叉验证,下面是几种不同类型的Cross validation的解释,有一个Idea in p27 in the ideas notebook。Cross validation is a model

27、evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not already seen. One way to overcome this problem is to not use the entire data set

28、when training a learner. Some of the data is removed before training begins. Then when training is done, the data that was removed can be used to test the performance of the learned model on new'' data. This is the basic idea for a whole class of model evaluation methods called cross validat

29、ion. 1.The holdout method is the simplest kind of cross validation. The data set is separated into two sets, called the training set and the testing set. The function approximator fits a function using the training set only. Then the function approximator is asked to predict the output values for th

30、e data in the testing set (it has never seen these output values before). The errors it makes are accumulated as before to give the mean absolute test set error, which is used to evaluate the model. The advantage of this method is that it is usually preferable to the residual method and takes no lon

31、ger to compute. However, its evaluation can have a high variance. The evaluation may depend heavily on which data points end up in the training set and which end up in the test set, and thus the evaluation may be significantly different depending on how the division is made. 2.The K-fold cross valid

32、ation is one way to improve over the holdout method. The data set is divided into k subsets, and the holdout method is repeated k times. Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. Then the average error across all k tria

33、ls is computed. The advantage of this method is that it matters less how the data gets divided. Every data point gets to be in a test set exactly once, and gets to be in a training set 推荐精选k-1 times. The variance of the resulting estimate is reduced as k is increased. The disadvantage of this method

34、 is that the training algorithm has to be rerun from scratch k times, which means it takes k times as much computation to make an evaluation. A variant of this method is to randomly divide the data into a test and training set k different times. The advantage of doing this is that you can independen

35、tly choose how large each test set is and how many trials you average over. k-折交叉验证:将训练样本集随机地分成k个互不相交的子集,每个折的大小大致相等。利用k-1个训练子集,对给定的一组参数建立回归模型,利用剩下的最后一个子集的MSE评估参数的性能。根据以上过程重复K次,因此每个子集都有机会进行测试,根据k次迭代后得到的MSE平均值来估计期望泛化误差,最后选择一组最优的参数。 3.The Leave-one-out cross validation is K-fold cross validation taken

36、to its logical extreme, with K equal to N, the number of data points in the set. That means that N separate times, the function approximator is trained on all the data except for one point and a prediction is made for that point. As before the average error is computed and used to evaluate the model

37、. The evaluation given by leave-one-out cross validation error (LOO-XVE) is good, but at first pass it seems very expensive to compute. Fortunately, locally weighted learners can make LOO predictions just as easily as they make regular predictions. That means computing the LOO-XVE takes no more time

38、 than computing the residual error and it is a much better way to evaluate models.k-折交叉验证(K-fold cross-validation)作者:willmove 日期:2006-11-23字体大小: 小 中 大推荐精选k-折交叉验证(K-fold cross-validation)是指将样本集分为k份,其中k-1份作为训练数据集,而另外的1份作为验证数据集。用验证集来验证所得分类器或者回归的错误码率。一般需要循环k次,直到所有k份数据全部被选择一遍为止。Cross ValidationCross vali

39、dation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not already seen. One way to overcome this problem is to not use th

40、e entire data set when training a learner. Some of the data is removed before training begins. Then when training is done, the data that was removed can be used to test the performance of the learned model on new'' data. This is the basic idea for a whole class of model evaluation methods ca

41、lled cross validation. The holdout method is the simplest kind of cross validation. The data set is separated into two sets, called the training set and the testing set. The function approximator fits a function using the training set only. Then the function approximator is asked to predict the outp

42、ut values for the data in the testing set (it has never seen these output values before). The errors it makes are accumulated as before to give the mean absolute test set error, which is used to evaluate the model. The advantage of this method is that it is usually preferable to the residual method

43、and takes no longer to compute. However, its evaluation can have a high variance. The evaluation may depend heavily on which data points end up in the training set and which end up in the test set, and thus the evaluation may be significantly different depending on how the division is made. K-fold c

44、ross validation is one way to improve over the holdout method. The data set is divided into k subsets, and the holdout method is repeated k times. Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. Then the average error across

45、all k trials is computed. The advantage of this method is that it matters less how the data gets divided. Every data point gets to be in a test set exactly once, and gets to be in a training set k-1 times. The variance of the resulting estimate is reduced as k is increased. The disadvantage of this

46、method is that the training algorithm has to be rerun from scratch k times, which means it takes k times as much computation to make an evaluation. A variant of this method is to randomly divide the data into a test and training set k different times. The advantage of doing this is that you can inde

47、pendently choose how large each test set is and how many trials you average over. Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set. That means that N separate times, the function approximator is trained on

48、 all the data except for one point and a prediction is made for that point. As before the average error is computed and used to evaluate the model. The evaluation given by leave-one-out cross validation error (LOO-XVE) is good, but at first pass it seems very expensive to compute. Fortunately, local

49、ly weighted learners can make LOO predictions just as easily as they make regular predictions. That means computing the LOO-XVE takes no more time than computing the residual error and it is a much better way to evaluate models. We will see shortly that Vizier relies heavily on LOO-XVE to choose its

50、 metacodes.     Figure 26: Cross validation checks how well a model generalizes to new dataFig. 26 shows an example of cross validation performing better than residual error. The data set in the top two graphs is a simple underlying function with significant noise. Cross validati

51、on tells us that broad smoothing is best. The data set in the bottom two graphs is a complex underlying function with no noise. Cross validation tells us that very little smoothing is best for this data set. Now we return to the question of choosing a good metacode for data set a1.mbl: File -> Op

52、en -> a1.mblEdit -> Metacode -> A90:9Model -> LOOPredictEdit -> Metacode -> L90:9Model -> LOOPredictEdit -> Metacode -> L10:9Model -> LOOPredictLOOPredict goes through the entire data set and makes LOO predictions for each point. At the bottom of the page it shows the s

53、ummary statistics including Mean LOO error, RMS LOO error, and information about the data point with the largest error. The mean absolute LOO-XVEs for the three metacodes given above (the same three used to generate the graphs in fig. 25), are 2.98, 1.23, and 1.80. Those values show that global line

54、ar regression is the best metacode of those three, which agrees with our intuitive feeling from looking at the plots in fig. 25. If you repeat the above operation on data set b1.mbl you'll get the values 4.83, 4.45, and 0.39, which also agrees with our observations. What are cross-validation and

55、 bootstrapping? -Cross-validation and bootstrapping are both methods for estimatinggeneralization error based on "resampling" (Weiss and Kulikowski 1991; Efronand Tibshirani 1993; Hjorth 1994; Plutowski, Sakata, and White 1994; Shaoand Tu 1995). The resulting estimates of generalization er

56、ror are often usedfor choosing among various models, such as different network architectures. Cross-validation+In k-fold cross-validation, you divide the data into k subsets of(approximately) equal size. You train the net k times, each time leavingout one of the subsets from training, but using only

57、 the omitted subset tocompute whatever error criterion interests you. If k equals the samplesize, this is called "leave-one-out" cross-validation. "Leave-v-out" is amore elaborate and expensive version of cross-validation that involvesleaving out all possible subsets of v cases.

58、Note that cross-validation is quite different from the "split-sample" or"hold-out" method that is commonly used for early stopping in NNs. In thesplit-sample method, only a single subset (the validation set) is used toestimate the generalization error, instead of k different subs

59、ets; i.e.,there is no "crossing". While various people have suggested thatcross-validation be applied to early stopping, the proper way of doing so isnot obvious. The distinction between cross-validation and split-sample validation isextremely important because cross-validation is markedly superior for smalldata sets; this fact is demonstrated dramatically by Goutte (1997) in areply to Zhu and Rohwer (1996). For an insightf

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论