数据挖掘外文翻译_第1页
数据挖掘外文翻译_第2页
数据挖掘外文翻译_第3页
数据挖掘外文翻译_第4页
数据挖掘外文翻译_第5页
已阅读5页,还剩23页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、 研 究 生 课 程 论 文( - 年 第2 学期)题 目:Summary of Data Mining Technology研究生:花君林提交日期: 年 6 月 24 日 研究生签名: 花君林 学 院化学工程学 号S16225048课程名称专业英语任课教师倪伶俐教师评语: 成绩评估: 分(百分制) 任课教师签名: 年 月 日说 明1课程论文要有题目、作者姓名、摘要、核心词、正文及参照文献。论文题目由研究生结合课程所学内容选定;摘要500字如下;核心词35个;参照文献不少于10篇。2论文规定自己动手撰写,如发现论文是从网上下载旳,或者是抄袭抄袭别人文章旳,按作弊解决,本门课程考核成绩计0分。3

2、课程论文用A4纸双面打印。字体所有用宋体简体,题目规定用小二号字加粗,标题行规定用小四号字加粗,正文内容规定用小四号字;经任课教师批准,课程论文可以用英文撰写,字体所有用Times New Roman,题目规定用18号字加粗;标题行规定用14号字加粗,正文内容规定用12号字;行距为2倍行距(以便教师批注);页边距左为3cm、右为2cm、上为2.5cm、下为2.5cm;其他格式请参照学位论文规定。4篇幅、内容等由任课教师提出具体规定。Summary of Data Mining TechnologyWang Mengxue(chengdu University of Technology,Che

3、ngdu610059,China)Abstract: With the development of computer and network technology, it is very easy to obtain relevant information. But for the large number of large-scale data, the traditional statistical methods can not complete the analysis of such data. Therefore, an intelligent, comprehensive a

4、pplication of a variety of statistical analysis, database, intelligent language to analyze large data data data mining (Date Mining) technology came into being. This paper mainly introduces the basic concept of data mining and the method of data mining. The application of data mining and its develop

5、ment prospect are also described in this paper.Keywords: data mining; method; application; foreground1 IntroductionWith the rapid development of information technology, the scale of the database has been expanding, resulting in a lot of data. The surge of data is hidden behind a lot of important inf

6、ormation, people want to be able to conduct a higher level of analysis in order to make better use of these data. In order to provide decision makers with a unified global perspective, data warehouses are established in many areas. But a lot of data often makes it impossible to identify hidden in wh

7、ich can provide support for decision-making information, and the traditional query, reporting tools can not meet the needs of mining this information. Therefore, the need for a new data analysis technology to deal with large amounts of data, and from the extraction of valuable potential knowledge, d

8、ata mining (Data Mining) technology came into being. Data mining technology is also accompanied by the development of data warehouse technology and gradually improved.2 Data Mining Technology2.1 Definition of data miningData mining refers to the non-trivial process of automatically extracting useful

9、 information hidden in the data from the data set. The information is represented by rules, concepts, rules and patterns. It helps decision makers analyze historical data and current data and discover hidden relationships and patterns to predict future behaviors that may occur. The process of data m

10、ining is also called the process of knowledge discovery. It is a kind of interdisciplinary and interdisciplinary subject, which involves the fields of database, artificial intelligence, mathematical statistics, visualization and parallel computing. Data mining is a new information processing technol

11、ogy, its main feature is the database of large amounts of data extraction, conversion, analysis and other model processing, and extract the auxiliary decision-making key data. Data mining is an important technology in KDD (Knowledge Discovery in Database). It does not use the standard database query

12、 language (such as SQL) to query, but the content of the query to summarize the pattern and the inherent law of the search. Traditional query and report processing are only the result of the incident, and there is no in-depth study of the reasons for the occurrence of data mining is the main underst

13、anding of the causes of occurrence, and with a certain degree of confidence in the future forecast for the decision-making behavior to provide favorable stand by.2.2 Methods of data miningData mining research combines a number of different disciplines in the field of technology and results, making t

14、he current data mining methods show a variety of forms. From the perspective of statistical analysis, the data mining models used in statistical analysis techniques are linear and non-linear analysis, regression analysis, logistic regression analysis, univariate analysis, multivariate analysis, time

15、 series analysis, recent sequence analysis, and recent Oracle algorithm and clustering analysis and other methods. Using these techniques, you can examine the data in those unusual forms, and then interpret the data using various statistical models and mathematical models to explain the market rules

16、 and business opportunities that are hidden behind those data. Knowledge discovery class Data mining technology is a kind of mining technology which is completely different from the statistical analysis class data mining technology, including artificial neural network, support vector machine, decisi

17、on tree, genetic algorithm, rough set, rule discovery and association order.2.2.1 Statistical methodsTraditional statistics provide a number of discriminant and regression analysis methods for data mining. Commonly used techniques such as Bayesian reasoning, regression analysis, and variance analysi

18、s. Bayesian reasoning is the basic principle of correcting the probability distribution of data sets after knowing new information Tools, to deal with the classification of data mining problems, regression analysis used to find an input variable and the relationship between the output variables of t

19、he best model, in the regression analysis used to describe a variable trends and other variables of the relationship between the linear regression, There is also a logarithmic regression for predicting the occurrence of certain events. The variance analysis in the statistical method is generally use

20、d to analyze the effects of estimating the regression lines performance and the independent variables on the final regression, which is the result of many mining applications One of the powerful tools.2.2.2 Association rulesThe association rule is a simple and practical analysis rule, which describe

21、s the law and pattern of some attributes in one thing at the same time, which is one of the most mature and important technologies in data mining. It is made by R. Agrawal et al. First proposed that the most classical association rule mining algorithm is Apriori, which first digs out all frequent it

22、emsets, and then generates association rules from frequent itemsets. Many mining rules of frequent rule sets are It evolved from the evolution of the rules in the field of data mining is widely used in large data sets to find a meaningful relationship between the data, one of the reasons is that it

23、is not only a choice of a dependent variable, the association rules in the data The most typical application of the mining area is the shopping basket analysis. Most association rule mining algorithms can discover all the associated relationships hidden in the mining data, and the amount of associat

24、ion rules is often very large. However, not all the relationships between the attributes obtained through the association are practical. Value, the effective evaluation of these association rules, screening out the user is really interested, meaningful association rules is particularly important.2.2

25、.3 Clustering analysisCluster analysis is based on the criteria associated with the selected samples to be divided into several groups, the same group of samples with high similarity, different groups are different, commonly used techniques have split algorithm, cohesion algorithm, Clustering and in

26、cremental clustering. The clustering method is suitable for the internal relationship between the samples, so as to make a reasonable evaluation of the sample structure. In addition, the cluster analysis is also used to detect the isolated points. Sometimes clustering is not intended to get objects

27、together but to make it easier for an object to be separated from other objects. Cluster analysis has been applied to a variety of areas such as economic analysis, pattern recognition, image processing, and especially in business. Clustering analysis can help marketers discover different groups of c

28、haracteristics that exist in customer groups. The key to clustering analysis In addition to the choice of algorithms, it is the choice of metrics for the sample. The classes that are not derived from the clustering algorithm are effective for decision making. Before applying an algorithm, the cluste

29、ring trend of the data is usually checked first.2.2.4 Decision tree methodDecision tree learning is a method of approximating discrete objective functions by classifying instances from a root node to a leaf node to classify an instance. The leaf node is the classification of the instance. Each node

30、on the tree illustrates a test of an attribute of the instance, and each subsequent branch of the node corresponds to a possible value of the attribute. The method of sorting the instance is from the root node of the tree, Test the properties specified by this node, and then move down the correspond

31、ing branch of the attribute value for the given instance. Decision tree method is to be applied to the classification of data mining.2.2.5 neural networkThe neural network is based on the mathematical model of self-learning, which can analyze a large number of complex data and can complete the extre

32、mely complex pattern extraction and trend analysis for human brain or other computer. The neural network can be expressed as guidance The learning can also be a non-guided cluster, whichever is the value entered into the neural network. Artificial neural network is used to simulate the structure of

33、human brain neurons. Based on MP model and Hebb learning rules, three kinds of neural networks are established, which have non-linear mapping characteristics, information storage, parallel processing and global collective action, High degree of self-learning, self-organizing and adaptive ability. Th

34、e feedforward neural network is represented by the sensor network and BP network, which can be used for classification and prediction. The feedback network is represented by Hopfield network for associative memory and optimization. The self-organizing network is based on ART model, Kohonon The model

35、 is represented for clustering.2.2.6 support vector machineSupport vector machine (SVM) is a new machine learning method developed on the basis of statistical learning theory. It is based on the principle of structural risk minimization, as far as possible to improve the learning machine generalizat

36、ion ability, has good promotion performance and good classification accuracy, can effectively solve the learning problem, has become a training multi-layer sensor, RBF An Alternative Method for Neural Networks and Polynomial Neural Networks. In addition, the support vector machine algorithm is a con

37、vex optimization problem, the local optimal solution must be the global optimal solution, these features are including the neural network, including other algorithms can not and. Support vector machine can be applied to the classification of data mining, regression, the exploration of unknown things

38、 and so on. In addition to the above methods, there are ways to convert data and results into visualization techniques, cloud model methods, and inductive logic programs.In fact, any kind of excavation tool is often based on specific issues to select the appropriate mining method, it is difficult to

39、 say which method is good, that method is inferior, but depending on the specific problems.2.3 data mining processFor data mining, we can be divided into three main stages: data preparation, data mining, evaluation and expression of results. The results of the evaluation and expression can also be b

40、roken down into: assessment, interpretation model model, consolidation, the use of knowledge. Knowledge discovery in the database is a multi-step process, but also the three stages of the repeated process,2.3.1 Data PreparationKDD processing object is a lot of data, these data are generally stored i

41、n the database system, the long-term accumulation of the results. But often not suitable for direct knowledge mining on these data, need to do data preparation, generally including the choice of data (select the relevant data), clean (eliminate noise, data), speculate (estimate missing data), conver

42、sion (discrete Data conversion between data and continuous value data, packet classification of data values, calculation combinations between data items, etc.), data reduction (reduction of data volume). These jobs are often prepared when the data warehouse is generated. Data preparation is the firs

43、t step in KDD. Whether data preparation is good will affect the efficiency and accuracy of data mining and the effectiveness of the final model.2.3.2 Data miningData mining is the most critical step KDD, but also technical difficulties. Most of the research KDD personnel are studying data mining tec

44、hnology, using more technology to have decision tree, classification, clustering, rough set, association rules, neural network, genetic algorithm and so on. Data mining According to the goal of KDD, select the parameters of the corresponding algorithm, analyze the data, and get the model model of th

45、e possible model layer knowledge.2.3.3 Results evaluation and expressionEvaluation model: the model model obtained above, there may be no practical significance or no use value, it may not be able to accurately reflect the true meaning of the data, even in some cases is contrary to the facts, so nee

46、d Evaluate, determine which are valid and useful patterns. Evaluation can be based on years of experience, some models can also be used directly to test the accuracy of the data. This step also includes presenting the pattern to the user in an easy-to-understand manner.Consolidate knowledge: the use

47、r understands and is considered to be consistent with the actual and valuable model of the model that forms the knowledge. But also pay attention to the consistency of knowledge to check, with the knowledge obtained before the conflict, contradictory embankment, so that knowledge is consolidated.The

48、 use of knowledge: to find knowledge is to use, how to make knowledge can be used is one of the steps of KDD. There are two ways to use knowledge: one is to rely on the relationship or result described by the knowledge itself to support decision-making; the other is to require the use of new data kn

49、owledge, which may produce new problems, and Need to further optimize the knowledge. The process of KDD may need to be repeated multiple times. Once each step does not match the expected target, go back to the previous step, re-adjust, and re-execute.3 data mining applicationsThe potential applicati

50、on of data mining is very broad: government management decision-making, business management, scientific research and industrial enterprise decision support and other fields.3.1 Applied in scientific researchFrom the point of view of scientific research methodology, scientific research can be divided

51、 into three categories: theoretical science, experimental science and computational science. Computational science is an important symbol of modern science. Computing scientists work with data and analyze a wide variety of experimental or observational data every day. With the use of advanced scient

52、ific data collection tools, such as observing satellites, remote sensors, DNA molecular technology, the amount of data is very large, the traditional data analysis tools can not do anything, so there must be a strong intelligent automatic data analysis tools Caixing. Data mining in astronomy has a v

53、ery famous application system: SKICAT (Sky Image Cataloging andAnalysis Tool). It is a tool developed by the California Institute of Technologys Jet Propulsion Laboratory (a laboratory designed to design a Mars probe rover) and astronomical scientists to help astronomers discover distant quasars. SK

54、ICAT is both the first successful data mining application and one of the first successful applications of artificial intelligence in astronomy and space science. Using SKICAT, astronomers have discovered 16 new and distant quasars that help astronomers better study the formation of quasars and the s

55、tructure of the early universe. The application of data mining in biology is mainly focused on the study of molecular biology, especially genetic engineering. Gene research, there is a well-known international research project - the human genome project.3.2 in the commercial applicationIn the busine

56、ss sector, especially in the retail industry, the use of data mining is more successful. As the MIS system in the commercial use of universal, especially the use of code technology, you can collect a lot of data on the purchase situation, and the amount of data in the surge. The use of data mining t

57、echnology can provide managers with the right decision-making means, so to promote sales and improve competitiveness is of great help.3.3 in the financial applicationIn the financial sector, the amount of data is very large, banks, securities companies and other transaction data and storage capacity

58、 is great. And for credit card fraud, the banks annual loss is very large. Therefore, you can use data mining to analyze the customers reputation. Typical financial analysis areas include investment assessment and stock trading market forecasts.3.4 in medical applicationsData mining in the medical a

59、pplication is very wide, from molecular medicine to medical diagnosis, can use data mining means to improve efficiency and efficiency. In the case of drug synthesis, the analysis of the chemical structure of the drug molecule can determine which of the atoms or atomic genes in the drug can play a ro

60、le in the disease, so that in the synthesis of new drugs, according to the molecular structure of the drug to determine the drug will be possible What kind of disease? Data mining can also be used in industry, agriculture, transportation, telecommunications, military, Internet and other industries.

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论