版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
DataWarehousing2022/10/281DataWarehousing2022/10/221
1.传统数据库以及OLTP(On-LineTransaction
Processing联机事务处理)在日常的管理事务处理中获得了巨大的成功,但是对管理人员的决策分析要求却无法满足。
2.因为,管理人员常常希望能够通过对组织中的大量数据进行分析,了解业务的的发展趋势。而传统数据库只保留了当前的业务处理信息,缺乏决策分析所需要的大量的历史信息。
3.为满足管理人员的决策分析需要,就需要在数据库的基础上产生适应决策分析的数据环境——数据仓库(DataWarehose)。数据仓库技术出现的背景2022/10/2821.传统数据库以及OLTP(On-LineTrans1--Whatisadatawarehouse?2022/10/2831--Whatisadatawarehouse?2WhatisDataWarehouse?Definedinmanydifferentways,butnotrigorously.Adecisionsupportdatabasethatismaintainedseparatelyfromtheorganization’soperationaldatabaseSupportinformationprocessingbyprovidingasolidplatformofconsolidated,historicaldataforanalysis.“Adatawarehouseisa
subject-oriented,integrated,time-variant,andnonvolatile
collectionofdatainsupportofmanagement’sdecision-makingprocess.”—W.H.Inmon2022/10/284WhatisDataWarehouse?Defined数据仓库的定义与基本特性
WilliamH.Inmon在1993年所写的论著《BuildingtheDataWarehouse》首先系统地阐述了关于数据仓库的思想、理论,为数据仓库的发展奠定了历史基石。文中他将数据仓库定义为:adatawarehouseisasubject-oriented,integrated,non-volatile,time-variantcollectionofdatainsupportofmanagementdecisions.
一个面向主题的、集成的、非易失性的、随时间变化的数据的集合,以用于支持管理层决策过程。2022/10/285数据仓库的定义与基本特性WilliamH.Inmon在DataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,sales.Focusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessing.Provideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess.2022/10/286DataWarehouse—Subject-OrienteDataWarehouse—IntegratedConstructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.2022/10/287DataWarehouse—IntegratedConstDataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystems.Operationaldatabase:currentvaluedata.Datawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”.2022/10/288DataWarehouse—TimeVariantTheDataWarehouse—Non-VolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironment.Operationalupdateofdatadoesnotoccurinthedatawarehouseenvironment.Doesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdata.2022/10/289DataWarehouse—Non-VolatileApDataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis2022/10/2810DataWarehousevs.HeterogeneoDataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueries2022/10/2811DataWarehousevs.OperationalOLTPvs.OLAP2022/10/2812OLTPvs.OLAP2022/10/2212WhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciled2022/10/2813WhySeparateDataWarehouse?Hi2--Amulti-dimensionaldatamodel2022/10/28142--Amulti-dimensionaldatamoFromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.2022/10/2815FromTablesandSpreadsheetstCube:ALatticeofCuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locationtime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex)cuboid1-Dcuboids2-Dcuboids3-Dcuboids4-D(base)cuboid2022/10/2816Cube:ALatticeofCuboidsalltAConceptHierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan..................allregionofficecountryTorontoFrankfurtcity2022/10/2817AConceptHierarchy:DimensionMultidimensionalDataSalesvolumeasafunctionofproduct,month,andregionProductRegionMonthDimensions:Product,Location,TimeHierarchicalsummarizationpathsIndustryRegionYearCategoryCountryQuarterProductCityMonthWeekOfficeDay2022/10/2818MultidimensionalDataSalesvolASampleDataCubeTotalannualsalesofTVinU.S.A.DateProductCountryAll,All,Allsumsum
TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum2022/10/2819ASampleDataCubeTotalannualCuboidsCorrespondingtotheCubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex)cuboid1-Dcuboids2-Dcuboids3-D(base)cuboid2022/10/2820CuboidsCorrespondingtotheC3--ConceptualModelingofDataWarehouses2022/10/28213--ConceptualModelingofDatModelingdatawarehouses:dimensions&measuresStarschema:AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakegalaxyschema:Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschema2022/10/2822Modelingdatawarehouses:dimeExampleofStarSchema
time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTable
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch2022/10/2823ExampleofStarSchematime_ExampleofSnowflakeSchematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSalesFactTable
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycityprovince_or_streetcountrycity2022/10/2824ExampleofSnowflakeSchematimExampleof
Galaxyschema
time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTabletime_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchShippingFactTabletime_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shippedshipper_keyshipper_namelocation_keyshipper_typeshipper2022/10/2825ExampleofGalaxyschematime_4--TypicalOLAPOperations2022/10/28264--TypicalOLAPOperations2022Rollup(drill-up):summarizedatabyclimbinguphierarchyorbydimensionreductionDrilldown(rolldown):reverseofroll-upfromhigherlevelsummarytolowerlevelsummaryordetaileddata,orintroducingnewdimensionsSliceanddice:
projectandselect
Pivot(rotate):
reorientthecube,visualization,3Dtoseriesof2Dplanes.Otheroperationsdrillacross:involving(across)morethanonefacttabledrillthrough:throughthebottomlevelofthecubetoitsback-endrelationaltables(usingSQL)2022/10/2827Rollup(drill-up):summarizeDataWarehousing2022/10/2828DataWarehousing2022/10/221
1.传统数据库以及OLTP(On-LineTransaction
Processing联机事务处理)在日常的管理事务处理中获得了巨大的成功,但是对管理人员的决策分析要求却无法满足。
2.因为,管理人员常常希望能够通过对组织中的大量数据进行分析,了解业务的的发展趋势。而传统数据库只保留了当前的业务处理信息,缺乏决策分析所需要的大量的历史信息。
3.为满足管理人员的决策分析需要,就需要在数据库的基础上产生适应决策分析的数据环境——数据仓库(DataWarehose)。数据仓库技术出现的背景2022/10/28291.传统数据库以及OLTP(On-LineTrans1--Whatisadatawarehouse?2022/10/28301--Whatisadatawarehouse?2WhatisDataWarehouse?Definedinmanydifferentways,butnotrigorously.Adecisionsupportdatabasethatismaintainedseparatelyfromtheorganization’soperationaldatabaseSupportinformationprocessingbyprovidingasolidplatformofconsolidated,historicaldataforanalysis.“Adatawarehouseisa
subject-oriented,integrated,time-variant,andnonvolatile
collectionofdatainsupportofmanagement’sdecision-makingprocess.”—W.H.Inmon2022/10/2831WhatisDataWarehouse?Defined数据仓库的定义与基本特性
WilliamH.Inmon在1993年所写的论著《BuildingtheDataWarehouse》首先系统地阐述了关于数据仓库的思想、理论,为数据仓库的发展奠定了历史基石。文中他将数据仓库定义为:adatawarehouseisasubject-oriented,integrated,non-volatile,time-variantcollectionofdatainsupportofmanagementdecisions.
一个面向主题的、集成的、非易失性的、随时间变化的数据的集合,以用于支持管理层决策过程。2022/10/2832数据仓库的定义与基本特性WilliamH.Inmon在DataWarehouse—Subject-OrientedOrganizedaroundmajorsubjects,suchascustomer,product,sales.Focusingonthemodelingandanalysisofdatafordecisionmakers,notondailyoperationsortransactionprocessing.Provideasimpleandconciseviewaroundparticularsubjectissuesbyexcludingdatathatarenotusefulinthedecisionsupportprocess.2022/10/2833DataWarehouse—Subject-OrienteDataWarehouse—IntegratedConstructedbyintegratingmultiple,heterogeneousdatasourcesrelationaldatabases,flatfiles,on-linetransactionrecordsDatacleaninganddataintegrationtechniquesareapplied.Ensureconsistencyinnamingconventions,encodingstructures,attributemeasures,etc.amongdifferentdatasourcesE.g.,Hotelprice:currency,tax,breakfastcovered,etc.Whendataismovedtothewarehouse,itisconverted.2022/10/2834DataWarehouse—IntegratedConstDataWarehouse—TimeVariantThetimehorizonforthedatawarehouseissignificantlylongerthanthatofoperationalsystems.Operationaldatabase:currentvaluedata.Datawarehousedata:provideinformationfromahistoricalperspective(e.g.,past5-10years)EverykeystructureinthedatawarehouseContainsanelementoftime,explicitlyorimplicitlyButthekeyofoperationaldatamayormaynotcontain“timeelement”.2022/10/2835DataWarehouse—TimeVariantTheDataWarehouse—Non-VolatileAphysicallyseparatestoreofdatatransformedfromtheoperationalenvironment.Operationalupdateofdatadoesnotoccurinthedatawarehouseenvironment.Doesnotrequiretransactionprocessing,recovery,andconcurrencycontrolmechanismsRequiresonlytwooperationsindataaccessing:initialloadingofdataandaccessofdata.2022/10/2836DataWarehouse—Non-VolatileApDataWarehousevs.HeterogeneousDBMSTraditionalheterogeneousDBintegration:Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhenaqueryisposedtoaclientsite,ameta-dictionaryisusedtotranslatethequeryintoqueriesappropriateforindividualheterogeneoussitesinvolved,andtheresultsareintegratedintoaglobalanswersetComplexinformationfiltering,competeforresourcesDatawarehouse:update-driven,highperformanceInformationfromheterogeneoussourcesisintegratedinadvanceandstoredinwarehousesfordirectqueryandanalysis2022/10/2837DataWarehousevs.HeterogeneoDataWarehousevs.OperationalDBMSOLTP(on-linetransactionprocessing)MajortaskoftraditionalrelationalDBMSDay-to-dayoperations:purchasing,inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP(on-lineanalyticalprocessing)MajortaskofdatawarehousesystemDataanalysisanddecisionmakingDistinctfeatures(OLTPvs.OLAP):Userandsystemorientation:customervs.marketDatacontents:current,detailedvs.historical,consolidatedDatabasedesign:ER+applicationvs.star+subjectView:current,localvs.evolutionary,integratedAccesspatterns:updatevs.read-onlybutcomplexqueries2022/10/2838DataWarehousevs.OperationalOLTPvs.OLAP2022/10/2839OLTPvs.OLAP2022/10/2212WhySeparateDataWarehouse?HighperformanceforbothsystemsDBMS—tunedforOLTP:accessmethods,indexing,concurrencycontrol,recoveryWarehouse—tunedforOLAP:complexOLAPqueries,multidimensionalview,consolidation.Differentfunctionsanddifferentdata:missingdata:DecisionsupportrequireshistoricaldatawhichoperationalDBsdonottypicallymaintaindataconsolidation:DSrequiresconsolidation(aggregation,summarization)ofdatafromheterogeneoussourcesdataquality:differentsourcestypicallyuseinconsistentdatarepresentations,codesandformatswhichhavetobereconciled2022/10/2840WhySeparateDataWarehouse?Hi2--Amulti-dimensionaldatamodel2022/10/28412--Amulti-dimensionaldatamoFromTablesandSpreadsheetstoDataCubesAdatawarehouseisbasedonamultidimensionaldatamodelwhichviewsdataintheformofadatacubeAdatacube,suchassales,allowsdatatobemodeledandviewedinmultipledimensionsDimensiontables,suchasitem(item_name,brand,type),ortime(day,week,month,quarter,year)Facttablecontainsmeasures(suchasdollars_sold)andkeystoeachoftherelateddimensiontablesIndatawarehousingliterature,ann-Dbasecubeiscalledabasecuboid.Thetopmost0-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapexcuboid.Thelatticeofcuboidsformsadatacube.2022/10/2842FromTablesandSpreadsheetstCube:ALatticeofCuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locationtime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex)cuboid1-Dcuboids2-Dcuboids3-Dcuboids4-D(base)cuboid2022/10/2843Cube:ALatticeofCuboidsalltAConceptHierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan..................allregionofficecountryTorontoFrankfurtcity2022/10/2844AConceptHierarchy:DimensionMultidimensionalDataSalesvolumeasafunctionofproduct,month,andregionProductRegionMonthDimensions:Product,Location,TimeHierarchicalsummarizationpathsIndustryRegionYearCategoryCountryQuarterProductCityMonthWeekOfficeDay2022/10/2845MultidimensionalDataSalesvolASampleDataCubeTotalannualsalesofTVinU.S.A.DateProductCountryAll,All,Allsumsum
TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum2022/10/2846ASampleDataCubeTotalannualCuboidsCorrespondingtotheCubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex)cuboid1-Dcuboids2-Dcuboids3-D(base)cuboid2022/10/2847CuboidsCorrespondingtotheC3--ConceptualModelingofDataWarehouses2022/10/28483--ConceptualModelingofDatModelingdatawarehouses:dimensions&measuresStarschema:AfacttableinthemiddleconnectedtoasetofdimensiontablesSnowflakeschema:Arefinementofstarschemawheresomedimensionalhierarchyisnormalizedintoasetofsmallerdimensiontables,formingashapesimilartosnowflakegalaxyschema:Multiplefacttablessharedimensiontables,viewedasacollectionofstars,thereforecalledgalaxyschema2022/10/2849Modelingdatawarehouses:dimeExampleofStarSchema
time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFactTable
time_key
item_key
branc
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 定制快餐桌椅合同
- 简单保证借款合同打印
- 房屋代理包租合同简易模板
- 简单超市租赁合同范本
- 中考物理复习专项多选题组1课件
- 高考总复习英语(人教版)随堂巩固训练选修6Unit3Ahealthylife
- 13水的浮力(原卷版)
- 1.1地球的宇宙环境课件高中地理人教版(2019)必修一
- 天津市静海县第一中学2017-2018学年高一4月学生学业能力调研测试历史试题(合格类)
- 第18讲构词法(练习)-2024年中考英语一轮复习
- 高中有机化学综合练习题(附答案)
- 涂料涂饰施工质量验收评定表
- 产品标识控制程序文件
- 提高内镜中心内镜洗消合格率PDCA
- 建设工程质量管理手册
- DB32-T 3904-2020电动自行车停放充电场所消防技术规范doc-(高清现行)
- 园长思想政治鉴定范文(5篇)
- 卫生系列评审高级专业技术资格答辩题解(神经外科)
- 2022年工程项目技术管理人员批评与自我批评
- 幼儿园PPT课件:数与运算2
- 常德自来水公司水表管理制度
评论
0/150
提交评论