数据仓库的概念课件_第1页
数据仓库的概念课件_第2页
数据仓库的概念课件_第3页
数据仓库的概念课件_第4页
数据仓库的概念课件_第5页
已阅读5页,还剩49页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

DataWarehousing2022/10/281DataWarehousing2022/10/221

1.传统数据库以及OLTP(On-LineTransaction

Processing联机事务处理)在日常的管理事务处理中获得了巨大的成功,但是对管理人员的决策分析要求却无法满足。

2.因为,管理人员常常希望能够通过对组织中的大量数据进行分析,了解业务的的发展趋势。而传统数据库只保留了当前的业务处理信息,缺乏决策分析所需要的大量的历史信息。

3.为满足管理人员的决策分析需要,就需要在数据库的基础上产生适应决策分析的数据环境——数据仓库(DataWarehose)。数据仓库技术出现的背景2022/10/2821.传统数据库以及OLTP(On-LineTrans1--Whatisadatawarehouse?2022/10/2831--Whatisadatawarehouse?2WhatisDataWarehouse?Definedinmanydifferentways,butnotrigorously.Adecisionsupportdatabasethatismaintainedseparatelyfromtheorganization’soperationaldatabaseSupportinformationprocessingbyprovidingasolidplatformofconsolidated,historicaldataforanalysis.“Adatawarehouseisa

subject-oriented,integrated,time-variant,andnonvolatile

collectionofdatainsupportofmanagement’sdecision-makingprocess.”—W.H.Inmon2022/10/284WhatisDataWarehouse?Defined数据仓库的定义与基本特性

WilliamH.Inmon在1993年所写的论著《BuildingtheDataWarehouse》首先系统地阐述了关于数据仓库的思想、理论,为数据仓库的发展奠定了历史基石。文中他将数据仓库定义为:adatawarehouseisasubject-oriented,integrated,non-volatile,time-variantcollectionofdatainsupportofmanagementdecisions.

一个面向主题的、集成的、非易失性的、随时间变化的数据的集合,以用于支持管理层决策过程。2022/10/285数据仓库的定义与基本特性WilliamH.Inmon在DataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,sales.Focusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessing.Provideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess.2022/10/286DataWarehouse—Subject-OrienteDataWarehouse—IntegratedConstructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.2022/10/287DataWarehouse—IntegratedConstDataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystems.Operationaldatabase:currentvaluedata.Datawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”.2022/10/288DataWarehouse—TimeVariantTheDataWarehouse—Non-VolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironment.Operationalupdateofdatadoesnotoccurinthedatawarehouseenvironment.Doesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdata.2022/10/289DataWarehouse—Non-VolatileApDataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis2022/10/2810DataWarehousevs.HeterogeneoDataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueries2022/10/2811DataWarehousevs.OperationalOLTPvs.OLAP2022/10/2812OLTPvs.OLAP2022/10/2212WhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciled2022/10/2813WhySeparateDataWarehouse?Hi2--Amulti-dimensionaldatamodel2022/10/28142--Amulti-dimensionaldatamoFromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.2022/10/2815FromTablesandSpreadsheetstCube:ALatticeofCuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locationtime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex)cuboid1-Dcuboids2-Dcuboids3-Dcuboids4-D(base)cuboid2022/10/2816Cube:ALatticeofCuboidsalltAConceptHierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan..................allregionofficecountryTorontoFrankfurtcity2022/10/2817AConceptHierarchy:DimensionMultidimensionalDataSalesvolumeasafunctionofproduct,month,andregionProductRegionMonthDimensions:Product,Location,TimeHierarchicalsummarizationpathsIndustryRegionYearCategoryCountryQuarterProductCityMonthWeekOfficeDay2022/10/2818MultidimensionalDataSalesvolASampleDataCubeTotalannualsalesofTVinU.S.A.DateProductCountryAll,All,Allsumsum

TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum2022/10/2819ASampleDataCubeTotalannualCuboidsCorrespondingtotheCubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex)cuboid1-Dcuboids2-Dcuboids3-D(base)cuboid2022/10/2820CuboidsCorrespondingtotheC3--ConceptualModelingofDataWarehouses2022/10/28213--ConceptualModelingofDatModelingdatawarehouses:dimensions&measuresStarschema:AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakegalaxyschema:Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschema2022/10/2822Modelingdatawarehouses:dimeExampleofStarSchema

time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTable

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch2022/10/2823ExampleofStarSchematime_ExampleofSnowflakeSchematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSalesFactTable

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycityprovince_or_streetcountrycity2022/10/2824ExampleofSnowflakeSchematimExampleof

Galaxyschema

time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTabletime_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchShippingFactTabletime_key

item_key

shipper_key

from_location

to_location

dollars_cost

units_shippedshipper_keyshipper_namelocation_keyshipper_typeshipper2022/10/2825ExampleofGalaxyschematime_4--TypicalOLAPOperations2022/10/28264--TypicalOLAPOperations2022Rollup(drill-up):summarizedatabyclimbinguphierarchyorbydimensionreductionDrilldown(rolldown):reverseofroll-upfromhigherlevelsummarytolowerlevelsummaryordetaileddata,orintroducingnewdimensionsSliceanddice:

projectandselect

Pivot(rotate):

reorientthecube,visualization,3Dtoseriesof2Dplanes.Otheroperationsdrillacross:involving(across)morethanonefacttabledrillthrough:throughthebottomlevelofthecubetoitsback-endrelationaltables(usingSQL)2022/10/2827Rollup(drill-up):summarizeDataWarehousing2022/10/2828DataWarehousing2022/10/221

1.传统数据库以及OLTP(On-LineTransaction

Processing联机事务处理)在日常的管理事务处理中获得了巨大的成功,但是对管理人员的决策分析要求却无法满足。

2.因为,管理人员常常希望能够通过对组织中的大量数据进行分析,了解业务的的发展趋势。而传统数据库只保留了当前的业务处理信息,缺乏决策分析所需要的大量的历史信息。

3.为满足管理人员的决策分析需要,就需要在数据库的基础上产生适应决策分析的数据环境——数据仓库(DataWarehose)。数据仓库技术出现的背景2022/10/28291.传统数据库以及OLTP(On-LineTrans1--Whatisadatawarehouse?2022/10/28301--Whatisadatawarehouse?2WhatisDataWarehouse?Definedinmanydifferentways,butnotrigorously.Adecisionsupportdatabasethatismaintainedseparatelyfromtheorganization’soperationaldatabaseSupportinformationprocessingbyprovidingasolidplatformofconsolidated,historicaldataforanalysis.“Adatawarehouseisa

subject-oriented,integrated,time-variant,andnonvolatile

collectionofdatainsupportofmanagement’sdecision-makingprocess.”—W.H.Inmon2022/10/2831WhatisDataWarehouse?Defined数据仓库的定义与基本特性

WilliamH.Inmon在1993年所写的论著《BuildingtheDataWarehouse》首先系统地阐述了关于数据仓库的思想、理论,为数据仓库的发展奠定了历史基石。文中他将数据仓库定义为:adatawarehouseisasubject-oriented,integrated,non-volatile,time-variantcollectionofdatainsupportofmanagementdecisions.

一个面向主题的、集成的、非易失性的、随时间变化的数据的集合,以用于支持管理层决策过程。2022/10/2832数据仓库的定义与基本特性WilliamH.Inmon在DataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,sales.Focusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessing.Provideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess.2022/10/2833DataWarehouse—Subject-OrienteDataWarehouse—IntegratedConstructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.2022/10/2834DataWarehouse—IntegratedConstDataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystems.Operationaldatabase:currentvaluedata.Datawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”.2022/10/2835DataWarehouse—TimeVariantTheDataWarehouse—Non-VolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironment.Operationalupdateofdatadoesnotoccurinthedatawarehouseenvironment.Doesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdata.2022/10/2836DataWarehouse—Non-VolatileApDataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis2022/10/2837DataWarehousevs.HeterogeneoDataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueries2022/10/2838DataWarehousevs.OperationalOLTPvs.OLAP2022/10/2839OLTPvs.OLAP2022/10/2212WhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciled2022/10/2840WhySeparateDataWarehouse?Hi2--Amulti-dimensionaldatamodel2022/10/28412--Amulti-dimensionaldatamoFromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.2022/10/2842FromTablesandSpreadsheetstCube:ALatticeofCuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locationtime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex)cuboid1-Dcuboids2-Dcuboids3-Dcuboids4-D(base)cuboid2022/10/2843Cube:ALatticeofCuboidsalltAConceptHierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan..................allregionofficecountryTorontoFrankfurtcity2022/10/2844AConceptHierarchy:DimensionMultidimensionalDataSalesvolumeasafunctionofproduct,month,andregionProductRegionMonthDimensions:Product,Location,TimeHierarchicalsummarizationpathsIndustryRegionYearCategoryCountryQuarterProductCityMonthWeekOfficeDay2022/10/2845MultidimensionalDataSalesvolASampleDataCubeTotalannualsalesofTVinU.S.A.DateProductCountryAll,All,Allsumsum

TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum2022/10/2846ASampleDataCubeTotalannualCuboidsCorrespondingtotheCubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex)cuboid1-Dcuboids2-Dcuboids3-D(base)cuboid2022/10/2847CuboidsCorrespondingtotheC3--ConceptualModelingofDataWarehouses2022/10/28483--ConceptualModelingofDatModelingdatawarehouses:dimensions&measuresStarschema:AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakegalaxyschema:Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschema2022/10/2849Modelingdatawarehouses:dimeExampleofStarSchema

time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTable

time_key

item_key

branc

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论