下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、 Training, Validation and Test DataExample:(A)We have data on 16 data items , their attributes and class labels.RANDOMLY divide them into 8 for training, 4 for validation and 4 for testing.Training Item No. d Attributes Class1.02.03.KNOWN FOR ALL14.15.DATA ITEMS16.17.08.0Validation 9.010.011.112.0Te
2、st 13.014.015.116.1(B). Next, suppose we develop, three classification models A, B, C from the training data. Let the training errors on these models be as shown below (recall that the models do not necessarily provide perfect results on training dataneither they are required to). Classification res
3、ults fromItem No.d- AttributesTrue Class Model A Model BModel C 1.00112.ALL KNOWN00003.10104.11015.10006.11117.00008.0000Classification Error2/83/83/8 (C). Next, use the three models A, B, C to classify each item in the validation set based on its attribute vales. Recall that we do know their true l
4、abels as well. Suppose we get the following results: Classification results fromItem No.d- AttributesTrue Class Model A Model BModel C 9.010010.001011.101012.0010Classification Error2/42/41/4If we use minimum validation error as model selection criterion, we would select model C.(D). Now use model C
5、 to determine class values for each data point in the test set. We do so by substituting the (known) attribute value into the classification model C. Again, recall that we know the true label of each of these data items so that we can compare the values obtained from the classification model with th
6、e true labels to determine classification error on the test set. Suppose we get the following results.Classification results from Item No.d- AttributesTrue ClassModel C13.0014. ALL KNOWN0015.1016.11Classification Error1/4(E). Based on the above, an estimate of generalization error is 25%. What this
7、means is that if we use Model C to classify future items for which only the attributes will be known, not the class labels, we are likely to make incorrect classifications about 25% of the time.(F). A summary of the above is as follows:ModelTrainingValidation Test A2550 -B37.550-C37.52525 Cross Vali
8、dationIf available data are limited, we employ Cross Validation (CV). In this approach, data are randomly divided into almost k equal sets. Training is done based on (k-1) sets and the k-th set is used for test. This process is repeated k times (k-fold CV). The average error on the k repetitions is
9、used as a measure of the test error.For the special case when k=1, the above is called Leave- One Out-Cross-Validation (LOO-CV).EXAMPLE: Consider the above data consisting of 16 items.(A). Let k= 4, i.e., 4- fold Cross Validation. Divide the data into four sets of 4 items each.Suppose the following
10、set up occurs and the errors obtained are as shown.Set 1 Set 2 Set 3 Set 4Training Items 1 - 12Items 1 - 813-16Items 1 - 49-16Items 5-16Test Items 13-16Items 9-12Items 5 - 8Items 1 4 Error on test set (assume)25%35%28%32%Estimated Classification Error (CE) = 25+35+28+32 = 30% 4(B). LOO CV For this, data are divided into 16 sets, each consisting of 15 training data and one test data. Set 1 Set 2 Set 15Set 16Training Items 1 - 15Items 1 14,16Item 1,3-8Items 2-16Test Item 16Item 15Item 2Ite
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年度速记服务与保密协议–聚法通专业法庭记录3篇
- 2025年版出租车公司股权转让及运营权移交协议模板3篇
- 个人与个人2024年度租赁合同9篇
- 个性化咨询服务2024年协议范本版A版
- 2025年航空航天零部件制造入股分红合同4篇
- 2025年度智慧停车设施物业管理合同4篇
- 2025年度文化艺术品代付款协议书4篇
- 二零二五版劳动合同法修订后企业应对策略合同3篇
- 2025版仓储消防安全检测与维护保养工程合同3篇
- 2025年高校食堂特色餐饮文化推广承包服务协议2篇
- 2025年春新沪科版物理八年级下册全册教学课件
- 2025届高考语文复习:散文的结构与行文思路 课件
- 电网调度基本知识课件
- 拉萨市2025届高三第一次联考(一模)语文试卷(含答案解析)
- 《保密法》培训课件
- 回收二手机免责协议书模板
- (正式版)JC∕T 60023-2024 石膏条板应用技术规程
- (权变)领导行为理论
- 2024届上海市浦东新区高三二模英语卷
- 2024年智慧工地相关知识考试试题及答案
- GB/T 8005.2-2011铝及铝合金术语第2部分:化学分析
评论
0/150
提交评论