数据挖掘外文翻译参考文献_第1页
数据挖掘外文翻译参考文献_第2页
数据挖掘外文翻译参考文献_第3页
数据挖掘外文翻译参考文献_第4页
数据挖掘外文翻译参考文献_第5页
已阅读5页,还剩11页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

数据挖掘外文翻译参考文献数据挖掘外文翻译参考文献(文档含中英文对照即英文原文和中文翻译)外文:WhatisDataMining?Simplystated,dataminingreferstoextractingor“mining”knowledgefromlargeamountsofdata.Thetermisactuallyamisnomer.Rememberthattheminingofgoldfromrocksorsandisreferredtoasgoldminingratherthanrockorsandmining.Thus,“datamining”shouldhavebeenmoreappropriatelynamed“knowledgeminingfromdata”,whichisunfortunatelysomewhatlong.“Knowledgemining”,ashorterterm,maynotreflecttheemphasisonminingfromlargeamountsofdata.Nevertheless,miningisavividtermcharacterizingtheprocessthatfindsasmallsetofpreciousnuggetsfromagreatdealofrawmaterial.Thus,suchamisnomerwhichcarriesboth“data”and“mining”becameapopularchoice.Therearemanyothertermscarryingasimilarorslightlydifferentmeaningtodatamining,suchasknowledgeminingfromdatabases,knowledgeextraction,data/patternanalysis,dataarchaeology,anddatadredging.Manypeopletreatdataminingasasynonymforanotherpopularlyusedterm,“KnowledgeDiscoveryinDatabases”,orKDD.Alternatively,othersviewdataminingassimplyanessentialstepintheprocessofknowledgediscoveryindatabases.Knowledgediscoveryconsistsofaniterativesequenceofthefollowingsteps:·datacleaning:toremovenoiseorirrelevantdata,·dataintegration:wheremultipledatasourcesmaybecombined,·dataselection:wheredatarelevanttotheanalysistaskareretrievedfromthedatabase,·datatransformation:wheredataaretransformedorconsolidatedintoformsappropriateforminingbyperformingsummaryoraggregationoperations,forinstance,·datamining:anessentialprocesswhereintelligentmethodsareappliedinordertoextractdatapatterns,·patternevaluation:toidentifythetrulyinterestingpatternsrepresentingknowledgebasedonsomeinterestingnessmeasures,and·knowledgepresentation:wherevisualizationandknowledgerepresentationtechniquesareusedtopresenttheminedknowledgetotheuser.Thedataminingstepmayinteractwiththeuseroraknowledgebase.Theinterestingpatternsarepresentedtotheuser,andmaybestoredasnewknowledgeintheknowledgebase.Notethataccordingtothisview,dataminingisonlyonestepintheentireprocess,albeitanessentialonesinceituncovershiddenpatternsforevaluation.Weagreethatdataminingisaknowledgediscoveryprocess.However,inindustry,inmedia,andinthedatabaseresearchmilieu,theterm“datamining”isbecomingmorepopularthanthelongertermof“knowledgediscoveryindatabases”.Therefore,inthisbook,wechoosetousetheterm“datamining”.Weadoptabroadviewofdataminingfunctionality:dataminingistheprocessofdiscoveringinterestingknowledgefromlargeamountsofdatastoredeitherindatabases,datawarehouses,orotherinformationrepositories.Basedonthisview,thearchitectureofatypicaldataminingsystemmayhavethefollowingmajorcomponents:1.Database,datawarehouse,orotherinformationrepository.Thisisoneorasetofdatabases,datawarehouses,spreadsheets,orotherkindsofinformationrepositories.Datacleaninganddataintegrationtechniquesmaybeperformedonthedata.2.Databaseordatawarehouseserver.Thedatabaseordatawarehouseserverisresponsibleforfetchingtherelevantdata,basedontheuser’sdataminingrequest.3.Knowledgebase.Thisisthedomainknowledgethatisusedtoguidethesearch,orevaluatetheinterestingnessofresultingpatterns.Suchknowledgecanincludeconcepthierarchies,usedtoorganizeattributesorattributevaluesintodifferentlevelsofabstraction.Knowledgesuchasuserbeliefs,whichcanbeusedtoassessapattern’sinterestingnessbasedonitsunexpectedness,mayalsobeincluded.Otherexamplesofdomainknowledgeareadditionalinterestingnessconstraintsorthresholds,andmetadata(e.g.,describingdatafrommultipleheterogeneoussources).4.Dataminingengine.Thisisessentialtothedataminingsystemandideallyconsistsofasetoffunctionalmodulesfortaskssuchascharacterization,associationanalysis,classification,evolutionanddeviationanalysis.5.Patternevaluationmodule.Thiscomponenttypicallyemploysinterestingnessmeasuresandinteractswiththedataminingmodulessoastofocusthesearchtowardsinterestingpatterns.Itmayaccessinterestingnessthresholdsstoredintheknowledgebase.Alternatively,thepatternevaluationmodulemaybeintegratedwiththeminingmodule,dependingontheimplementationofthedataminingmethodused.Forefficientdatamining,itishighlyrecommendedtopushtheevaluationofpatterninterestingnessasdeepaspossibleintotheminingprocesssoastoconfinethesearchtoonlytheinterestingpatterns.6.Graphicaluserinterface.Thismodulecommunicatesbetweenusersandthedataminingsystem,allowingtheusertointeractwiththesystembyspecifyingadataminingqueryortask,providinginformationtohelpfocusthesearch,andperformingexploratorydataminingbasedontheintermediatedataminingresults.Inaddition,thiscomponentallowstheusertobrowsedatabaseanddatawarehouseschemasordatastructures,evaluateminedpatterns,andvisualizethepatternsindifferentforms.Fromadatawarehouseperspective,dataminingcanbeviewedasanadvancedstageofon-1ineanalyticalprocessing(OLAP).However,datamininggoesfarbeyondthenarrowscopeofsummarization-styleanalyticalprocessingofdatawarehousesystemsbyincorporatingmoreadvancedtechniquesfordataunderstanding.Whiletheremaybemany“dataminingsystems”onthemarket,notallofthemcanperformtruedatamining.Adataanalysissystemthatdoesnothandlelargeamountsofdatacanatmostbecategorizedasamachinelearningsystem,astatisticaldataanalysistool,oranexperimentalsystemprototype.Asystemthatcanonlyperformdataorinformationretrieval,includingfindingaggregatevalues,orthatperformsdeductivequeryansweringinlargedatabasesshouldbemoreappropriatelycategorizedaseitheradatabasesystem,aninformationretrievalsystem,oradeductivedatabasesystem.Datamininginvolvesanintegrationoftechniquesfrommult1pledisciplinessuchasdatabasetechnology,statistics,machinelearning,highperformancecomputing,patternrecognition,neuralnetworks,datavisualization,informationretrieval,imageandsignalprocessing,andspatialdataanalysis.Weadoptadatabaseperspectiveinourpresentationofdatamininginthisbook.Thatis,emphasisisplacedonefficientandscalabledataminingtechniquesforlargedatabases.Byperformingdatamining,interestingknowledge,regularities,orhigh-levelinformationcanbeextractedfromdatabasesandviewedorbrowsedfromdifferentangles.Thediscoveredknowledgecanbeappliedtodecisionmaking,processcontrol,informationmanagement,queryprocessing,andsoon.Therefore,dataminingisconsideredasoneofthemostimportantfrontiersindatabasesystemsandoneofthemostpromising,newdatabaseapplicationsintheinformationindustry.AclassificationofdataminingsystemsDataminingisaninterdisciplinaryfield,theconfluenceofasetofdisciplines,includingdatabasesystems,statistics,machinelearning,visualization,andinformationscience.Moreover,dependingonthedataminingapproachused,techniquesfromotherdisciplinesmaybeapplied,suchasneuralnetworks,fuzzyandorroughsettheory,knowledgerepresentation,inductivelogicprogramming,orhighperformancecomputing.Dependingonthekindsofdatatobeminedoronthegivendataminingapplication,thedataminingsystemmayalsointegratetechniquesfromspatialdataanalysis,Informationretrieval,patternrecognition,imageanalysis,signalprocessing,computergraphics,Webtechnology,economics,orpsychology.Becauseofthediversityofdisciplinescontributingtodatamining,dataminingresearchisexpectedtogeneratealargevarietyofdataminingsystems.Therefore,itisnecessarytoprovideaclearclassificationofdataminingsystems.Suchaclassificationmayhelppotentialusersdistinguishdataminingsystemsandidentifythosethatbestmatchtheirneeds.Dataminingsystemscanbecategorizedaccordingtovariouscriteria,asfollows.1)Classificationaccordingtothekindsofdatabasesmined.Adataminingsystemcanbeclassifiedaccordingtothekindsofdatabasesmined.Databasesystemsthemselvescanbeclassifiedaccordingtodifferentcriteria(suchasdatamodels,orthetypesofdataorapplicationsinvolved),eachofwhichmayrequireitsowndataminingtechnique.Dataminingsystemscanthereforebeclassifiedaccordingly.Forinstance,ifclassifyingaccordingtodatamodels,wemayhavearelational,transactional,object-oriented,object-relational,ordatawarehouseminingsystem.Ifclassifyingaccordingtothespecialtypesofdatahandled,wemayhaveaspatial,time-series,text,ormultimediadataminingsystem,oraWorld-WideWebminingsystem.Othersystemtypesincludeheterogeneousdataminingsystems,andlegacydataminingsystems.2)Classificationaccordingtothekindsofknowledgemined.Dataminingsystemscanbecategorizedaccordingtothekindsofknowledgetheymine,i.e.,basedondataminingfunctionalities,suchascharacterization,discrimination,association,classification,clustering,trendandevolutionanalysis,deviationanalysis,similarityanalysis,etc.Acomprehensivedataminingsystemusuallyprovidesmultipleand/orintegrateddataminingfunctionalities.Moreover,dataminingsystemscanalsobedistinguishedbasedonthegranularityorlevelsofabstractionoftheknowledgemined,includinggeneralizedknowledge(atahighlevelofabstraction),primitive-levelknowledge(atarawdatalevel),orknowledgeatmultiplelevels(consideringseverallevelsofabstraction).Anadvanceddataminingsystemshouldfacilitatethediscoveryofknowledgeatmultiplelevelsofabstraction.3)Classificationaccordingtothekindsoftechniquesutilized.Dataminingsystemscanalsobecategorizedaccordingtotheunderlyingdataminingtechniquesemployed.Thesetechniquescanbedescribedaccordingtothedegreeofuserinteractioninvolved(e.g.,autonomoussystems,interactiveexploratorysystems,query-drivensystems),orthemethodsofdataanalysisemployed(e.g.,database-orientedordatawarehouse-orientedtechniques,machinelearning,statistics,visualization,patternrecognition,neuralnetworks,andsoon).Asophisticateddataminingsystemwilloftenadoptmultipledataminingtechniquesorworkoutaneffective,integratedtechniquewhichcombinesthemeritsofafewindividualapproaches.翻译:什么是数据挖掘?简单地说,数据挖掘是从大量的数据中提取或“挖掘”知识。该术语实际上有点儿用词不当。注意,从矿石或砂子中挖掘黄金叫做黄金挖掘,而不是叫做矿石挖掘。这样,数据挖掘应当更准确地命名为“从数据中挖掘知识”,不幸的是这个有点儿长。“知识挖掘”是一个短术语,可能它不能反映出从大量数据中挖掘的意思。毕竟,挖掘是一个很生动的术语,它抓住了从大量的、未加工的材料中发现少量金块这一过程的特点。这样,这种用词不当携带了“数据”和“挖掘”,就成了流行的选择。还有一些术语,具有和数据挖掘类似但稍有不同的含义,如数据库中的知识挖掘、知识提取、数据/模式分析、数据考古和数据捕捞。许多人把数据挖掘视为另一个常用的术语—数据库中的知识发现或KDD的同义词。而另一些人只是把数据挖掘视为数据库中知识发现过程的一个基本步骤。知识发现的过程由以下步骤组成:1)数据清理:消除噪声或不一致数据,2)数据集成:多种数据可以组合在一起,3)数据选择:从数据库中检索与分析任务相关的数据,4)数据变换:数据变换或统一成适合挖掘的形式,如通过汇总或聚集操作,5)数据挖掘:基本步骤,使用智能方法提取数据模式,6)模式评估:根据某种兴趣度度量,识别表示知识的真正有趣的模式,7)知识表示:使用可视化和知识表示技术,向用户提供挖掘的知识。数据挖掘的步骤可以与用户或知识库进行交互。把有趣的模式提供给用户,或作为新的知识存放在知识库中。注意,根据这种观点,数据挖掘只是整个过程中的一个步骤,尽管是最重要的一步,因为它发现隐藏的模式。我们同意数据挖掘是知识发现过程中的一个步骤。然而,在产业界、媒体和数据库研究界,“数据挖掘”比那个较长的术语“数据库中知识发现”更为流行。因此,在本书中,选用的术语是数据挖掘。我们采用数据挖掘的广义观点:数据挖掘是从存放在数据库中或其他信息库中的大量数据中挖掘出有趣知识的过程。基于这种观点,典型的数据挖掘系统具有以下主要成分:数据库、数据仓库或其他信息库:这是一个或一组数据库、数据仓库、电子表格或其他类型的信息库。可以在数据上进行数据清理和集成。数据库、数据仓库服务器:根据用户的数据挖掘请求,数据库、数据仓库服务器负责提取相关数据。知识库:这是领域知识,用于指导搜索,或评估结果模式的兴趣度。这种知识可能包括概念分层,用于将属性或属性值组织成不同的抽象层。用户确信方面的知识也可以包含在内。可以使用这种知识,根据非期望性评估模式的兴趣度。领域知识的其他例子有兴趣度限制或阈值和元数据(例如,描述来自多个异种数据源的数据)。数据挖掘引擎:这是数据挖掘系统基本的部分,由一组功能模块组成,用于特征化、关联、分类、聚类分析以及演变和偏差分析。模式评估模块:通常,此成分使用兴趣度度量,并与数据挖掘模块交互,以便将搜索聚集在有趣的模式上。它可能使用兴趣度阈值过滤发现的模式。模式评估模块也可以与挖掘模块集成在一起,这依赖于所用的数据挖掘方法的实现。对于有效的数据挖掘,建议尽可能深地将模式评估推进到挖掘过程之中,以便将搜索限制在有兴趣的模式上。图形用户界面:本模块在用户和数据挖掘系统之间进行通信,允许用户与系统进行交互,指定数据挖掘查询或任务,提供信息、帮助搜索聚焦,根据数据挖掘的中间结果进行探索式数据挖掘。此外,此成分还允许用户浏览数据库和数据仓库模式或数据结构,评估挖掘的模式,以不同的形式对模式进行可视化。从数据仓库观点,数据挖掘可以看作联机分析处理(OLAP)的高级阶段。然而,通过结合更高级的数据理解技术,数据挖掘比数据仓库的汇总型分析处理走得更远。尽管市场上已有许多“数据挖掘系统”,但是并非所有系统的都能进行真正的数据挖掘。不能处理大量数据的数据分析系统,最多是被称作机器学习系统、统计数据分析工具或实验系统原型。一个系统只能够进行数据或信息检索,包括在大型数据库中找出聚集的值或回答演绎查询,应当归类为数据库系统

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论