UCI大数据库使用说明书_第1页
UCI大数据库使用说明书_第2页
UCI大数据库使用说明书_第3页
UCI大数据库使用说明书_第4页
UCI大数据库使用说明书_第5页
已阅读5页,还剩11页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、实用标准文案UCI 数据库使用说明机器学习领域的 UCI 数据集使用说明此目录包含数据集和相关领域知识(后面以简短的列表形式进行的注释) 这些数据已经或能用于评价学习算法。每个数据文件( *.data )包含以“属性 - 值”对形式描述的很多个体样本的 记录。对应的 *.info 文件包含的大量的文档资料。 (有些文件 _generate_ databases ;他们不包含 *.data 文件。)作为数据集和领域知识的补充, 在 utilities 目录里包含了一些在使用这一数据集时的有用资料。地 址 /mlearn/MLRepository.html

2、, 这 里 的UCI 数据集可以看作是通过 web 的远程拷贝。作为选择,这些数据同样可以通 过 ftp 获 得 , ftp:/ . 可 是 使 用 匿 名 登 陆 ftp 。 可 以 在 pub/machine-learning-databases 目录中找到。UCI 一直都在寻找可加入的新数据, 这些数据将被写入 incoming 子目录中 希望您能贡献您的数据,并提供相应的文档。谢谢贡献过程可以参考 DOC-REQUIREMENTS 文件。目前,多数数据使用下面的格式: 一个实例一行, 没有空格,属性值之间使用逗号“ ,”隔开,并且缺少的值使用问号“ ?”

3、表示。精彩文档实用标准文案并请在做出您的贡献后提醒一下站点管理员: 下面以 UCI 中 IRIS 为例介绍一下数据集:ucidatairis 中有三个文件:Isindex 为文件夹目录,列出了本文件夹里的所有文件,如 iris 中 index 的内容如下:Index of iris18 Mar 1996105 Index08 Mar 19934551 iris.data30 May 19892604 siris.data 为 iris 数据文件,内容如下:5.1,3.5,1.4,0.2

4、,Iris-setosa4.9,3.0,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa7.0,3.2,4.7,1.4,Iris-versicolor精彩文档实用标准文案6.4,3.2,4.5,1.5,Iris-versicolor6.9,3.1,4.9,1.5,Iris-versicolor6.3,3.3,6.0,2.5,Iris-virginica5.8,2.7,5.1,1.9,Iris-virginica7.1,3.0,5.9,2.1,Iris-virginica如上,属性直接以逗号隔开,中间没有空格( 5.1,3.5,1.4,0.2, ),最

5、后一列为本行属性对应的值,即决策属性 Is 介绍了 irir 数据的一些相关信息,如数据标题、数据来源、以前使用情况、最近信息、实例数目、实例的属性等,如下所示部分:7. Attribute Information:1. sepal length in cm2. sepal width in cm3. petal length in cm4. petal width in cm5. class:- Iris Setosa- Iris Versicolour精彩文档实用标准文案- Iris Virginica9. Class Distribution: 3

6、3.3% for each of 3 classes.本数据的使用实例请参考其他论文,或本站后面的内容。This is the UCI Repository Of Machine LearningDatabases and Domain TheoriesThis is the UCI Repository Of Machine Learning Databases and Domain Theories4 December 1995: pub/machine-learning-databases /mlearn/MLRep

7、ository.htmlLibrarian: Patrick M. Murphy ( )111 databases and domain theories (36MB)haveThis directory contains data sets and domain theories (the latter been精彩文档实用标准文案annotated as such in the following brief listing) that have been or can be used to evaluate learning algorit

8、hms. Each data file (*.data) contains individual records described in terms of attribute-value pairs. The corresponding *.info file contains voluminous documentation. (Some files_generate_ databases; they do not have *.data files.)In addition to data sets and domain theories, the utilities/ director

9、y contains utilities that you may find useful when using datasets in this repository.The contents of this repository can be viewed and remotely copied overthe web. The address is /mlearn/MLRepository.html.Alternatively, the contents of this repository can be remotely copied via

10、ftp to . Enter anonymous for user id, and e-mail address (email=userhostuserhost/email) for password. These databases can be found by executingcd pub/machine-learning-databases.Notes:1. Were always looking for addition al databases, which can bewritten to the sub-directory named /inco

11、ming. Please send yours, withdocumentation. Thanks - See DOC-REQUIREMENTS for suggested精彩文档实用标准文案documentationprocedures. Presently, most databases have the following format: 1 instance per line, no spaces, commas separate attribute values, and missing values are denoted by ?. Also, please notify th

12、e site librarian( ) after making a donation.2. Ivan Bratko requested that the databases he donated from the LjubljanaOncology Institute (e.g., breast-cancer, lymphography, and primary-tumor)have restricted access. We are allowed to share them with academic institutions upon r

13、equest. These databases (like several others) requireproviding proper citations be made in published articles that use them.Citation requirements are in each databases corresponding *.doc file.To access any of these databases, send email to .To aid you in deciding if you wan

14、t any of these databases, the documentation files are available.3. An archive server may now be used to recieve via e-mail files in thisrepository. Installed on ics, it provides email access to files in精彩文档实用标准文案our anonymous ftp/uucp area (ftp). If people have no other access toour archives, then t

15、hey can send mail to:Commands to the server may be given in the body. Some commands are: helpsend find The help command replies with a useful help message.If you publish material based on databases obtained from this repository, then, in your acknowledgements, please note th

16、e assistance you received byusing this repository. Thanks - this will help others to obtain the samedata sets and replicate your experiments. We suggest the following pseudo-APAreference format for referring to this repository (LaTeXd):Murphy,P .M., & Aha,D.W. (1994). it UCI Repository of machine le

17、arning databases/mlearn/MLRepository.html .Irvine, CA: University of California, Department of Information and Computer精彩文档实用标准文案Science.Patrick M. Murphy (Repository Librarian)Brief Overview of Databases and Domain Theories:Quick Listing:1. annealing (David Sterling and Wray Bu

18、ntine)2. Artificial Characters Database & DT (donated by Attilio Giordana)3-4. audiology (Ray Bareiss and Bruce Porter, used in Protos)1. Original Version2. Standardized-Attribute Version of the Original.5. auto-mpg (from CMU StatLib library)6. autos (Jeff Schlimmer)7. badges (Haym Hirsh)8. balance-

19、scale (Tim Hume)9. balloons (Michael Pazzani)10. breast-cancer (Ljubljana Institute of Ontcology, restricted access)Olvi11. breast-cancer-wisconsin (Wisconsin Breast Cancer Dbase, Mangasarian)1. Original version2. Diagnostic data set3. Prognostic data set精彩文档实用标准文案12. bridges (Yoram Reich)13-21. che

20、ss1. Partial generator of Quinlans chess-end-game data (kr-vs-kn) (Schlimmer)2. Shapiros endgame database (kr-vs-kp) (Rob Holte)3. king-rook-vs-king (Michael Bain, Arthur van Hoff)4-9. Six domain theories (Nick Flann)22. Bach Chorales (time-series) database (Darrell Conklin)23. Connect-4 Database (J

21、ohn Tromp)24-25. Credit Screening Database1. Japanese Credit Screening Data and domain theory (Chiharu Sano)2. Credit Card Application Approval Database (Ross Quinlan)26. Ein-Dor and Feldmessers cpu-performance database (David Aha)27. Diabetes Data (Serdar Uckun, AI-M94)28. dgp-2 data generation pro

22、gram (Powell Benedict)29. Document Understanding (Donato Malerba)30. Nine small EBL domain theories and examples in sub-directory ebl31. Evlin Kinneys echocardiogram database (Steven Salzberg)32. flags (Richard Forsyth)33. function-finding (Cullen Schafers 352 case studies)34. glass (Vina Spiehler)3

23、5. hayes-roth (from Hayes-Roth2s paper)精彩文档实用标准文案36-39. heart-disease (Robert Detrano)40. hepatitis (G. Gong)41. horse colic database (Mary McLeish & Matt Cecile)42. (Boston) Housing database (from CMU StatLib library)43. ICU data (Serdar Uckun, AIM-94)44. Image segmentation database (Carla Brodley)

24、45. ionosphere information (Vince Sigillito)46. iris (R.A. Fisher, 1936)47. isolet (Ron Cole and Mark Fantys database donated by TomDietterich)48. kinship (J. Ross Quinlan)49. labor-negotiations (Stan Matwin)50-51. led-display-creator (from the CART book)52. lenses (Cendrowskas database donated by B

25、enoit Julien)53. letter-recognition database (created and donated by David Slate)54. liver-disorders (BUPA Medicals database donated by Richard Forsyth)55. logic-theorist (Paul ORorke)56. lung cancer (Stefan Aeberhard)57. lymphography (Ljubjana Institute of Oncology, restricted access)58-59. mechani

26、cal-analysis (Francesco Bergadano)1. Original Mechanical Analysis Data Set2. PUMPS DATA SET精彩文档实用标准文案60 mobile robots (donated by Klingspor, Morik and Rieger)61-64. molecular-biologydomain& Rich1. promoter sequences (Towell, Shavlik, & Noordewier, theory also)2. splice-junction sequences (Towell, No

27、ordewier, & Shavlik, domain theory also)3. protein secondary structure database (Qian and Sejnowski)4. protein secondary structure domain theory (Jude Shavlik Maclin)65. MONKs Problems (donated by Sebastian Thrun)66. Moral Reasoner Database (donated by James Wogulis)67. mushroom (Jeff Schlimmer)68.

28、MUSK databases (2) (donated by Tom Dietterich)69. othello domain theory (Tom Fawcett)70. Page Blocks Classification (Donato Malerba)71. Pima Indians diabetes diagnoses (Vince Sigillito)72. Postoperative Patient data (Jerzy W. Grzymala-Busse)73. Primary Tumor (Ljubjana Institute of Oncology, restrict

29、ed access)74. Qualitative Structure Activity Relationships (QSARs) (Ross King)75. Quadraped Animals (John H. Gennari)76. Servo data (Ross Quinlan)77. shuttle-landing-control (Bojan Cestnik)精彩文档实用标准文案78. solar flare (Gary Bradshaw)79-80. soybean (from Ryszard Michalskis groups)81. space shuttle databases (David Draper)82. spectrometer (Infra-Red Astronomy Satellite Project Database, John Stutz)83. Sponge Database (Iosune Uriz and Marta Domingo)84. Statlog Project databases (7) (from Ross King,.)85 Student Loan relational database (f

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论