《人工智能与数据挖掘教学课件》2.datawarehou_第1页
《人工智能与数据挖掘教学课件》2.datawarehou_第2页
《人工智能与数据挖掘教学课件》2.datawarehou_第3页
《人工智能与数据挖掘教学课件》2.datawarehou_第4页
《人工智能与数据挖掘教学课件》2.datawarehou_第5页
已阅读5页,还剩70页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Data WarehouseWhy Data warehouseThe most common issue companies face when looking at data mining is that the information is not in one place.The biggest challenge business analysts face in using data mining is how to extract, integrate, cleanse, and prepare data to solve their most pressing business

2、 problems.What is Data WarehouseThe idea of a data warehouse is to put a wide range of operational data from internal and external sources into one place so it can be better utilized by executives, line of business managers and other business analysts.Once the information is gathered, OLAP (on-line

3、analytical processing ) software comes into play by providing the desktop analysis tools for querying, manipulating and reporting the data from the data warehouse. Data Warehouse environment the source systems from which data is extracted the tools used to extract data for loading the data warehouse

4、 the data warehouse database itself where the data is stored the desktop query and reporting tools used for decision support Data Warehousing Process Overview Operational Vs. Multidimensional View Of SalesCreating A Data WarehouseThe Data WarehouseThe Data Warehouse is an integrated, subject-oriente

5、d, time-variant, non-volatile database that provides support for decision making.The Data WarehouseIntegratedThe Data Warehouse is a centralized, consolidated database that integrates data retrieved from the entire organization.Subject-Oriented The Data Warehouse data is arranged and optimized to pr

6、ovide answers to questions coming from diverse functional areas within a company.The Data WarehouseTime Variant The Warehouse data represent the flow of data through time. It can even contain projected data.Non-Volatile Once data enter the Data Warehouse, they are never removed.The Data Warehouse is

7、 always growing.Operational Database vs. Data warehouse Operational DBSimilar data can have different representations or meaningsFunctional or process orientationCurrent transactionFrequent updating Data WarehouseUnified view of all data elementsSubject orientation for decision supportHistorical inf

8、ormation with time dimensionData are added without changeData MartA data mart is a small, single-subject data warehouse subset that provides decision support to a small group of people.Data MartData Marts can serve as a test vehicle for companies exploring the potential benefits of Data Warehouses.D

9、ata Marts address local or departmental problems, while a Data Warehouse involves a company-wide effort to support decision making at all levels in the organization.Enterprise Data Warehouse (EDW)A large scare data warehouse that is used across the enterprise for decision supportEDW are used to prov

10、ide data for many types of DSS, including CRM, SCM, BPM, BAM, PLM, and KMS.BPM: Business performance managementBAM: Business activity monitoringPLM: product lifecycle managementKMS: Knowledge management systemsMetadataMetadata is the data about data. In a data warehouse, metadata describe the conten

11、ts of a data warehouse and the manner of its useGood metadata is essential to the effective operation of a data warehouse and it is used in data acquisition/collection, data transformation, and data access. The needs for Technical metadataThe use of data warehousing and decision processing often inv

12、olves a wide range of different products, and creating and maintaining the meta data for these products is time- consuming and error prone.Automating the meta data management process and enabling the sharing of this so-called technical meta data between products can reduce both costs and errors.The

13、Needs for Business metadataBusiness users need to have a good understanding of what information exists in a data warehouse. They need to understand what the information means from a business viewpoint, how it was derived, from what source systems it comes, when it was created, what pre-built reports

14、 and analyses exist for manipulating the information, and so forth. metadata in a data warehouseKimball lists the following types of metadata in a data warehouse:Source system metadataData staging metadataDBMS metadataRalph Kimball, The Data Warehouse Lifecycle Toolkit, Wiley, 1998, ISBN 0-471-25547

15、-5source system metadata source specifications, such as repositories, and source logical schemas source descriptive information, such as ownership descriptions, update frequencies and access methods process information, such as job schedules and extraction code data staging metadata data acquisition

16、 information, such as data transmission scheduling and results, and file usage dimension table management, such as definitions of dimensions, and surrogate key assignments transformation and aggregation, such as data enhancement and mapping, DBMS load scripts, and aggregate definitions audit, job lo

17、gs and documentation, such as data lineage records, data transform logs Star SchemaThe star schema is a data modeling technique used to map multidimensional decision support into a relational database.Star schemas yield an easily implemented model for multidimensional data analysis while still prese

18、rving the relational structure of the operational database.Star SchemaFour Components:FactsDimensionsAttributesAttribute hierarchiesFigure 13.14 A Three-Dimensional View of Sales Figure 13.17 Attribute Hierarchies in Multidimensional Analysis FactsNumeric measurements that represent specific busines

19、s aspect or activityNormally stored in fact table that is center of star schemaFact table contains facts linked through their dimensionsMetrics are facts computed at run timeDimensionsQualifying characteristics provide additional perspectives to a given factDecision support data almost always viewed

20、 in relation to other dataStudy facts via dimensionsDimensions stored in dimension tablesAttributesDimensions provide descriptions of facts through their attributesNo mathematical limit to the number of dimensionsUse to search, filter, and classify factsSlice and dice: focus on slices of the data cu

21、b for more detailed analysisAttribute HierarchiesProvide top-down data organizationTwo purpose: AggregationDrill-down/roll-up data analysisDetermine how the data are extracted and representedStored in a DBMSs data dictionaryUsed by OLAP tool to access warehouse properly.Star SchemaA star schema cons

22、ists of fact tables and dimension tables. Fact tables contain the quantitative or factual data about a business-the information being queried. This information is often numerical, additive measurements and can consist of many columns and millions or billions of rows.Dimension tables are usually smal

23、ler and hold descriptive data that reflects the dimensions, or attributes, of a business.Figure 13.17 Star Schema For SalesStar Schema RepresentationFacts and dimensions are normally represented by physical tables in the data warehouse database.The fact table is related to each dimension table in a

24、many-to-one (M:1) relationship.Fact and dimension tables are related by foreign keys and are subject to the primary/foreign key constraints.Figure 13.18 Orders Star SchemaStar SchemaPerformance-Improving TechniquesNormalization of dimensional tablesMultiple fact tables representing different aggrega

25、tion levelsDenormalization of fact tablesTable partitioning and replicationFigure 13.19 Normalized Dimension TablesMultiple Fact TablesPracticeHow to design a star schema for an auto insurance company to do risk analysis?What is the Objective?What are the Facts?What are the Dimensions?What are the A

26、ttributes?What are the Attribute hierarchy?Auto insurance DW star schemaData Warehouse Design Grain A definition of the highest level of detail that is supported in a data warehouse Drill-downThe process of probing beyond a summarized value to investigate each of the detail transactions that compris

27、e the summary Data Warehouse ImplementationThe Data Warehouse as an Active Decision Support NetworkA Company-Wide Effort that Requires User Involvement and Commitment at All LevelsSatisfy the Trilogy: Data, Analysis, and UsersApply Database Design ProceduresData Warehouse Implementation Implementing

28、 a data warehouse is generally a massive effort that must be planned and executed according to established methodsThere are many facets to the project lifecycle, and no single person can be an expert in each area Data Warehouse Implementation Road MapData Integration and the Extraction, Transformati

29、on, and Load (ETL) ProcessData integration comprises three major processes: data access (the ability to access and extract data from any data source)data federation (the integration of business views across multiple data stores), and change capture (the identification, capture , and delivery of the

30、changes made to enterprise data sources). Data Integration and the Extraction, Transformation, and Load (ETL) ProcessExtraction, transformation, and load (ETL)Extraction - reading data from a databaseTransformation - converting the extracted data from its previous form into the form that can be plac

31、ed into a data warehouse Load - putting the data into the data warehouseData Integration and the Extraction, Transformation, and Load (ETL) ProcessData CleanseData cleansing or data scrubbing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table,

32、 or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant etc. parts of the data and then replacing, modifying or deleting this dirty data.ETL toolsA good ETL tool must be able to communicate with the many different relational databases and

33、read the various file formats used throughout an organization. ETL tools have started to migrate into Enterprise Application Integration, or even Enterprise Service Bus, systems that now cover much more than just the extraction, transformation and loading of data. Many ETL vendors now have data prof

34、iling, data quality and metadata capabilities.On-Line Analytical ProcessingOn-Line Analytical Processing (OLAP) is an advanced data analysis environment that supports decision making, business modeling, and operations research activities. Four Main Characteristics of OLAPUse multidimensional data an

35、alysis techniques.Provide advanced database support.Provide easy-to-use end user interfaces.Support client/server architecture.On-Line Analytical ProcessingAdditional Functions of Multidimensional Data Analysis TechniquesAdvanced data presentation functionsAdvanced data aggregation, consolidation, a

36、nd classification functionsAdvanced computational functionsAdvanced data modeling functionsIntegration Of OLAP With A Spreadsheet ProgramFigure 13.7 OLAP Server ArrangementSAPs Business Information Warehouse:an Enterprise-Wide Information HubAn end-to-end enterprise-wide information hub to support p

37、lanning and decision-making.A central data repository of SAP, non-SAP, current, and historical business transactions and meta data.Timely information to all levels and roles, from analyst to executive.Years of SAP financial, logistic, and human resource information systems experience wedded with mod

38、ern data warehouse methodologies. SAP AG 1999 / 2BW Architecture detailsR/3 OLTP ApplicationsR/3 OLTP ApplicationsOLTPReportingOLTPReportingProduction DataExtractorProduction DataExtractorBusiness InformationWarehouse ServerStagingStagingBAPIBAPIBusiness ExplorerAnalyzer(hosted by MS Excel)Analyzer(

39、hosted by MS Excel)BrowserBrowserNon R/3 Production DataExtractorNon R/3 Production DataExtractorNon R/3 OLTP ApplicationsNon R/3 OLTP Applications3rd party OLAP client3rd party OLAP clientData ManagerData ManagerOperationalData Store3rd party OLAP client3rd party OLAP client3rd party OLAP clients3r

40、d party OLAP clientsMeta Data ManagerMeta Data ManagerStaging EngineStaging EngineAdministratorWorkbenchAdministrationAdministrationSchedulingSchedulingMonitorMonitorOLAP ProcessorOLAP ProcessorMeta DataRepositoryMeta DataRepositoryInfoCubesOLE-DB for OLAP ProviderOLE-DB for OLAP ProviderODBOODBOBAP

41、IBAPIData ProviderServerData ProviderServerRemoteCubeRemoteCubeBAPIBAPIStagingStagingBAPIBAPIPSAA Sample Of Current Data Warehousing And Data Mining VendorsTable 13.10Success Stories at PepsiUsing the data warehouse, weve been able to identify important items, find national suppliers for them, and l

42、everage those relationships to reduce costs.“Thanks to the warehouse, Pepsi can monitor purchasing compliance at the user level, an ability that has boosted price and product compliance well over 90 percent.The warehouse also helps ensure 100 percent sales tax compliance, says Bridgman. Since going

43、online in 1995, the warehouse has helped generate procurement savings in excess of $100 million.Levels of DW Support for Enterprise Decision MakingThe need for real-time dataA business often cannot afford to wait a whole day for its operational data to load into the data warehouse for analysisProvid

44、es incremental real-time data showing every state change and almost analogous patterns over timeMaintaining metadata in sync is possibleLess costly to develop, maintain, and secure one huge data warehouse so that data are centralized for BI/BA toolsAn EAI with real-time data collection can reduce or

45、 eliminate the nightly batch processes Real-Time / Active Data Warehouse (RDW/ADW)Loading and and providing data via the data warehouse as they become available.Expand traditional data warehouse functions into the realm of tactical decision makingEmpower decision making when interact directly with customers and suppliers.Real

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论