UCI大数据库使用说明书_第1页
UCI大数据库使用说明书_第2页
UCI大数据库使用说明书_第3页
已阅读5页,还剩8页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、UCI数据库使用说明机器学习领域的UCI数据集使用说明此目录包含数据集和相关领域知识(后面以简短的列表形式进行的注释) 这些数据已经或能用于评价学习算法。每个数据文件(*data )包含以“属性-值”对形式描述的很多个体样本的 记录。对应的*.info文件包含的大量的文档资料。(有些文件_generate_databases ;他们不包含*.data文件。)作为数据集和领域知识的补充,在utilities 目录里包含了一些在使用这一数据集时的有用资料。地址 /mlearn/MLRepository.html, 这里的UCI数据集可以看作是通过 web的

2、远程拷贝。作为选择,这些数据同样可以通过ftp 获得,ftp:/ .可是使用匿名登陆ftp 。可以在 pub/mach in e-lear nin g-databases目录中找到。UCI 一直都在寻找可加入的新数据,这些数据将被写入in comi ng子目录中 希望您能贡献您的数据,并提供相应的文档。谢谢一一贡献过程可以参考 DOC-REQUIREMENTS文件。目前,多数数据使用下面的格式:一个实例一行, 没有空格,属性值之间使用逗号“,”隔开,并且缺少的值使用问号“ ?”表示。并请在做出您的贡献后提醒一下站点管理员:ml-repositoryics.uci

3、.edu下面以UCI中IRIS为例介绍一下数据集:ucidatairis 中有三个文件:In dexiris.datairis.n amesindex为文件夹目录,列出了本文件夹里的所有文件,女口iris中index的内容如下:In dex of iris18 Mar 1996105 In dex08 Mar 19934551 iris.data30 May 19892604 iris. namesiris.data为iris数据文件,内容如下:5.1,3.5,1.4,0.2,lris-setosa4.9,3.0,1.4,0.2,lris-setosa4.7,3.2,1.3,0.2,lris-

4、setosa7.0,3.2,4.7,1.4,lris-versicolor6.4,3.2,4.5,1.5,lris-versicolor6.9,3.1,4.9,1.5,lris-versicolor6.3,3.3,6.0,2.5,lris-virgi nica5.8,2.7,5.1,1.9,lris-virgi nica7.1,3.0,5.9,2.1,lris-virgi nica如上,属性直接以逗号隔开,中间没有空格(5.1,3.5,1.4,02 ),最后一列为本行属性对应的值,即决策属性Is介绍了 irir数据的一些相关信息,如数据标题、数据来源、以

5、前使用情况、最近信息、实例数目、实例的属性等,如下所示部分:7. Attribute In formatio n:1. sepal le ngth in cm2. sepal width in cm3. petal le ngth in cm4. petal width in cm5. class:-Iris Setosa-Iris Versicolour-Iris Virgi nica9. Class Distribution: 33.3% for each of 3 classes.本数据的使用实例请参考其他论文,或本站后面的内容。This is the UCI Repository Of

6、 Machine LearningDatabases and Doma in TheoriesThis is the UCI Repository Of Machine Learning Databases and Domain Theories4 December 1995: pub/mach in e-lear nin g-databases/mlearn/MLRepository.htmlLibraria n: Patrick M. Murphy ( )have111

7、 databases and domain theories (36MB)This directory contains data sets and domain theories (the latter bee nanno tated as such in the follow ing brief listi ng) that have bee n or can beused to evaluate lear ning algorithms. Each data file (*.data) containsin dividual records described in terms of a

8、ttribute-value pairs. Thecorresponding *.info file contains voluminousdocumentation. (Somefiles_gen erate_ databases; they do not have *.data files.)In addition to data sets and domain theories, the utilities/ directorycontains utilities that you may find useful when using datasets in thisrepository

9、.The contents of this repository can be viewed and remotely copied overtheweb. Theaddressis/mlearn/MLRepository.html.Alter natively, the contents of this repository can be remotely copied viaftp to . En ter a nonym ous for user id, and e-mail address(email=userhos

10、tuserhost/email)for password. These databasescan be found by executi ngcd pub/mach in e-lear nin g-databases.Notes:1. Were always look ing for additi on al databases, which can bewritte n to the sub-directory n amed /incomin g. Please send yours,withdocumentation. Thanks- See DOC-REQUIREMENTS for su

11、ggested docume ntati onprocedures. Prese ntly, most databases have the follow ing format: 1in sta nee per line, no spaces, commas separate attribute values, andmissing values are denoted by ?. Also, please notify the site libraria n( ) after making a don atio n.2. Ivan Bratko

12、 requested that the databases he donated from the Ljublja naOn cology In stitute (e.g.,breast-ca ncer,lymphography, andprimary-tumor)have restricted access. We are allowed to share them with academicinstitutionsupon request. These databases (like several others)requireprovidingproper citations be ma

13、de in published articles that usethem.Citati on requireme nts are in each databases corresp onding *.doc file.To access any of these databases, send email to .To aid you in decid ing if you want any of these databases, the documentation files are available.3. An archive serv

14、er may now be used to recieve via e-mail files in thisrepository. In stalled on ics, it provides email access to files inour anonym ous ftp/uucp area (ftp). If people have no other accesstoour archives, the n they can send mail to:Comma nds to the server may be give n in the

15、 body. Some comma ndsare:helpsend find The help comma nd replies with a useful help message.If you publish material based on databases obtained from this repository, then, in your ack no wledgeme nts, please note the assista nee you received byusing this repository. Thanks - this will help others to

16、 obtain the samedata sets and replicate your experiments. We suggest the following pseudo-APArefere nee format for referri ng to this repository (LaTeXd):Murphy,P .M., & Aha,D.W. (1994). it UCI Repository of machinelear ningdatabases/mlearn/MLRepository.html.Irvine, CA: Universi

17、ty of California, Departmentof InformationandComputerScien ce.Patrick M. Murphy (Repository Libraria n)Brief Overview of Databases and Doma in Theories:Quick List ing:1. ann eali ng (David Sterl ing and Wray Buntine)2. Artificial Characters Database & DT (do nated by Attilio Giorda na)3-4. audiology

18、 (Ray Bareiss and Bruce Porter, used in Protos)1. Origi nal Versio n2. Sta ndardized-Attribute Versi on of the Origi nal.5. auto-mpg (from CMU StatLib library)6. autos (Jeff Schlimmer)7. badges (Haym Hirsh)8. bala nce-scale (Tim Hume)9. ballo ons (Michael Pazza ni)10. breast-cancer (Ljubljana Instit

19、ute of Ontcology, restricted access)Olvi11. breast-ca ncer-wisc onsin(Wise onsinBreast Can cer Dbase,Man gasaria n)1. Original versi on2. Diag no stic data set3. Prog no stic data set12. bridges (Yoram Reich)13-21. chess1. Partial generatorof Quinlans chess-end-game data (kr-vs-kn)(Schlimmer)2. Shap

20、iros en dgame database (kr-vs-kp) (Rob Holte)3. king-rook-vs-king (Michael Bain, Arthur van Hoff)4-9. Six domain theories (Nick Flann)22. Bach Chorales (time-series) database (Darrell Con kli n)23. Conn ect-4 Database (Joh n Tromp)24-25. Credit Scree ning Database1. Japa nese Credit Scree ning Data

21、and doma in theory (Chiharu Sano)2. Credit Card Applicati on Approval Database (Ross Qui nla n)26. Ein-Dor and Feldmessers cpu-performa nee database (David Aha)27. Diabetes Data (Serdar Ucku n, AI-M94)28. dgp-2 data gen erati on program (Powell Ben edict)29. Docume nt Un dersta nding (Don ato Malerb

22、a)30. Nine small EBL doma in theories and examples in sub-directory ebl31. Evli n Kinn eys echocardiogram database (Steve n Salzberg)32. flags (Richard Forsyth)33. fun cti on-finding (Culle n Schafers 352 case studies)34. glass (Vi na Spiehler)35. hayes-roth (from Hayes-RothA2s paper)36-39. heart-di

23、sease (Robert Detra no)40. hepatitis (G. Gong)41. horse colic database (Mary McLeish & Matt Cecile)42. (Bost on) Housi ng database (from CMU StatLib library)43. ICU data (Serdar Uckun, AIM-94)44. Image segme ntati on database (Carla Brodley)45. io no sphere in formatio n (Vi nee Sigillito)46. iris (

24、R.A. Fisher, 1936)47. isolet (Ron Cole and Mark Fantys database donated by TomDietterich)48. ki nship (J. Ross Qui nlan)49. labor- negotiatio ns (Sta n Matwi n)50-51. led-display-creator (from the CART book)52. le nses (Ce ndrowskas database don ated by Ben oit Julie n)53. letter-recog nitio n datab

25、ase (created and don ated by David Slate)54. liver-disorders (BUPA Medicals database don ated by Richard Forsyth)55. logic-theorist (Paul ORorke)56. lu ng can cer (Stefa n Aeberhard)57. lymphography (Ljubja na In stitute of On cology, restricted access)58-59. mecha ni cal-a nalysis (Fra ncesco Berga

26、da no)1. Original Mechanical Analysis Data Set2. PUMPS DATA SET60 mobile robots (don ated by Kli ngspor, Morik and Rieger)doma in& Rich61-64. molecular-biology1. promoterseque nces(Towell, Shavlik, & Noordewier,theory also)2. splice-ju ncti on seque nces (Towell, Noordewier, & Shavlik,doma in theory

27、 also)3. prote in sec on dary structure database (Qia n and Sejno wski)4. protein sec on dary structure doma in theory (Jude Shavlik Macli n)65. MONKs Problems (do nated by Sebastian Thru n)66. Moral Reas oner Database (don ated by James Wogulis)67. mushroom (Jeff Schlimmer)68. MUSK databases (do na

28、ted by Tom Dietterich)69. othello doma in theory (Tom Fawcett)70. Page Blocks Classificati on (Do nato Malerba)71. Pima In dia ns diabetes diag no ses (Vi nee Sigillito)72. Postoperative Patie nt data (Jerzy W. Grzymala-Busse)73. Primary Tumor (Ljubja na In stitute of On cology, restricted access)74

29、. Qualitative Structure Activity Relatio nships (QSARs) (Ross Kin g)75. Quadraped An imals (Joh n H. Genn ari)76. Servo data (Ross Qui nlan)77. shuttle-la ndin g-c on trol (Boja n Cest nik)78. solar flare (Gary Bradshaw)79-80. soybea n (from Ryszard Michalskis groups)81. space shuttle databases (David Draper)82. spectrometer (In fra-Red Astr onomy Satellite Project Database, Joh n Stutz)83. Sponge Database (Ios une Uriz and Marta Domin go)84. Statlog Project databases (from Ross Kin g,.)85 Stude nt Loa n relatio nal database (from Michael Pazza

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论