11数据仓库技术讲座-57_第1页
11数据仓库技术讲座-57_第2页
11数据仓库技术讲座-57_第3页
11数据仓库技术讲座-57_第4页
11数据仓库技术讲座-57_第5页
已阅读5页,还剩61页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、12 十月 2022Data Warehousing and OLAP Technology1数据仓库和OLAP技术什么是数据仓库( What is a data warehouse)? 多维数据模型(A multi-dimensional data model)数据仓库体系结构(Data warehouse architecture)数据仓库实现(Data warehouse implementation)Further development of data cube technologyFrom data warehousing to data mining01三三月2020Data W

2、arehousing andOLAP Technology2数据库的的定义传统的数数据库技技术是以以单一的的数据资资源为中中心,同同时进行行从事务务处理,批处理理到决策策分析的的各类处处理;数据库主主要是为为自动化化,精简简工作任任务和高高速数据据采集服服务的。它的运运行是事事务驱动动,面向向应用的的,数据据库的根根本任务务是完成成数据操操作,即即及时安安全地将将当前事事务所产产生的记记录保存存下来。01三三月2020Data Warehousing andOLAP Technology3两种不同同的数据据处理需需求计算机系系统中存存在着两两类不同同的数据据处理需需求,即即:操作型处处理(事务

3、处处理):主要是是对一个个或一组组记录的的查询和和修改,这时候候人们关关心的是是响应时时间、数据的安安全性和和完整性性;分析型处处理(信息型型处理):用于于管理人人员的决决策分析析,如DDS(decisionsupportsystem)、多维分析析等。01三三月2020Data Warehousing andOLAP Technology4为什么要要建立数数据仓库库?数据DATA知识KNOWLEDGE决定DECISIONSPatternsTrendsFactsRelationsModelsAssociationsSequencesTargetMarketsFundsallocationTrad

4、ing optionsWheretoadvertiseCatalog mailinglistSalesgeography财经的Financial经济的Economic政府Government销售分数数Point-of-Sale人口统计计学Demographic生活方式式Lifestyle痛苦:太多数据据,无法法作出正正确判断断!01三三月2020Data Warehousing andOLAP Technology5What is DataWarehouse?数据仓库库是在企企业管理理和决策策中面向主题题的,集成的,与时间相相关的和不可修改改的数据集集合“Adata warehouseisas

5、ubject-oriented,integrated,time-variant,andnonvolatilecollection of datainsupportofmanagementsdecision-making process.”W.H.InmonData warehousing:Theprocessofconstructingand using datawarehouses01三三月2020Data Warehousing andOLAP Technology6Data WarehouseSubject-OrientedOrganizedaround major subjects,s

6、uch ascustomer,product,sales.Focusingonthemodelingand analysis of datafor decision makers,not on daily operationsortransactionprocessing.Providea simpleandconciseview aroundparticular subjectissues byexcludingdatathat arenotuseful in thedecisionsupport process.01三三月2020Data Warehousing andOLAP Techn

7、ology7面向应用用举例采购子系系统:订单(订订单号,供应商商号,总总金额,日期)订单细则则(订单单号,商商品号,类别,单价,数量)供应商(供应商商号,供供应商名名,地址址,电话话)销售子系系统:顾客(顾顾客号,姓名,性别,年龄,地址,电话)销售(员工号,顾客号号,商品品号,数数量,单单价日期期)库存管理理子系统统:领料单(领料单号号,领料人人,商品品号,数数量,日日期)进料单(进料单号号,订单单号,进进料人,收料人人,日期期)库存(商商品号,库房号号,库存存量,日日期)库房(库房号,仓库保保管员,地点,库存商商品描述述)人事管理理子系统统:员工(员员工号,姓名,性别,年龄,部门号号)部门(

8、部部门号,部门名名称,部部门主管管,电话话)面向主题题举例:商品:商品固有有信息:商品号号,商品品名,类类别,颜颜色等商品采购购信息:商品号号,供应应商号,供应价价,供应应日期,供应量量等商品销售售信息:商品号号,顾客客号,售售价,销销售日期期,销售售量等商品库存存信息:商品号号,库房房号,日日期,库库存量等等供应商:供应商固固有信息息:供应应商号,供应商商名,地地址,电电话等供应商品品信息:供应商商号,商商品号,供应价价,供应应日期,供应量量等顾客:顾客固有有信息:顾客号号,顾客客名,性性别,年年龄,住住址,电电话等顾客购物物信息:顾客号号,商品品号,售售价,购购买日期期,购买买量等01三三

9、月2020Data Warehousing andOLAP Technology8Data WarehouseIntegratedConstructedbyintegratingmultiple,heterogeneousdata sourcesrelational databases,flatfiles,on-linetransactionrecordsData cleaning anddata integration techniquesare applied.Ensureconsistencyinnaming conventions,encodingstructures,attribut

10、emeasures, etc.amongdifferent datasourcesE.g.,Hotelprice:currency, tax,breakfast covered, etc.When dataismovedtothe warehouse,itisconverted.01三三月2020Data Warehousing andOLAP Technology9Data WarehouseTime VariantThetimehorizon forthedatawarehouseissignificantlylongerthanthat of operational systems.Op

11、erationaldatabase:currentvaluedata.Data warehousedata:provide information fromahistorical perspective (e.g.,past 5-10years)Everykeystructure in thedata warehouseContainsanelement of time, explicitlyorimplicitlyButthe keyofoperationaldata mayormaynot contain“timeelement”.01三三月2020Data Warehousing and

12、OLAP Technology10Data WarehouseNon-VolatileAphysically separate storeofdata transformed fromthe operational environment.Operationalupdateofdatadoes notoccurinthedatawarehouseenvironment.Does notrequire transaction processing, recovery,andconcurrencycontrolmechanismsRequiresonly twooperations in data

13、accessing:initial loadingofdataandaccessofdata.01三三月2020Data Warehousing andOLAP Technology11Data Warehousevs.HeterogeneousDBMSTraditionalheterogeneous DB integration:Buildwrappers/mediatorsontopofheterogeneousdatabasesQuerydrivenapproachWhen aqueryisposedtoaclientsite,ameta-dictionaryisusedtotransl

14、atethe query intoqueriesappropriateforindividualheterogeneous sites involved,andthe resultsare integratedintoa globalanswersetComplex information filtering,competeforresourcesData warehouse:update-driven, highperformanceInformationfrom heterogeneoussourcesisintegrated in advanceand storedinwarehouse

15、s fordirectqueryand analysis01三三月2020Data Warehousing andOLAP Technology12Data Warehousevs.OperationalDBMSOLTP (on-line transaction processing)Majortask of traditional relationalDBMSDay-to-day operations: purchasing, inventory,banking,manufacturing,payroll,registration,accounting,etc.OLAP (on-line a

16、nalyticalprocessing)Majortask of datawarehouse systemData analysis anddecisionmakingDistinctfeatures(OLTPvs.OLAP):User andsystemorientation: customer vs.marketData contents:current,detailedvs.historical,consolidatedDatabasedesign: ER +applicationvs. star+subjectView:current,localvs.evolutionary,inte

17、gratedAccesspatterns: updatevs.read-only butcomplex queries01三三月2020Data Warehousing andOLAP Technology13OLTP vs.OLAP01三三月2020Data Warehousing andOLAP Technology14WhySeparateDataWarehouse?High performance forboth systemsDBMStunedforOLTP:access methods, indexing,concurrencycontrol,recoveryWarehousetu

18、nedfor OLAP: complexOLAPqueries,multidimensional view, consolidation.Differentfunctions anddifferentdata:missing data: Decision supportrequireshistoricaldata which operational DBsdonottypically maintaindata consolidation:DSrequiresconsolidation (aggregation,summarization)ofdatafrom heterogeneoussour

19、cesdata quality: differentsources typicallyuseinconsistent datarepresentations,codesandformatswhichhave to be reconciled01三三月2020Data Warehousing andOLAP Technology15Data Warehousing andOLAP TechnologyWhat is adatawarehouse?A multi-dimensional datamodelData warehousearchitectureData warehouseimpleme

20、ntationFurther development of datacubetechnologyFrom datawarehousingtodatamining01三三月2020Data Warehousing andOLAP Technology16From TablesandSpreadsheets to DataCubesA datawarehouse is based on amultidimensional datamodelwhichviewsdata in theform of adatacubeA datacube,suchassales, allowsdata to be m

21、odeledand viewedinmultipledimensionsDimensiontables,such asitem (item_name, brand,type),ortime(day,week,month, quarter, year)Fact table contains measures (such asdollars_sold) andkeys to eachofthe relateddimension tablesIndata warehousing literature, an n-Dbase cubeiscalled abase cuboid. Thetopmost0

22、-Dcuboid,whichholdsthehighest-levelofsummarization,iscalledtheapex cuboid.Thelatticeofcuboids forms adata cube.01三三月2020Data Warehousing andOLAP Technology17Cube:A LatticeofCuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locat

23、iontime,item,suppliertime,location,supplieritem,location,suppliertime,item,location,supplier0-D(apex) cuboid1-D cuboids2-D cuboids3-D cuboids4-D(base) cuboid01三三月2020Data Warehousing andOLAP Technology18Conceptual Modeling of DataWarehousesModelingdata warehouses: dimensions&measuresStar schema:A fa

24、cttableinthe middleconnectedtoasetofdimension tablesSnowflakeschema:A refinementofstarschemawheresomedimensionalhierarchyisnormalizedinto aset of smallerdimension tables, formingashapesimilar to snowflakeFact constellations:Multiplefact tablessharedimensiontables, viewedasa collectionofstars, theref

25、orecalledgalaxyschemaorfact constellation01三三月2020Data Warehousing andOLAP Technology19Example of StarSchema time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSalesFact Tabletime_keyitem_keybranch_keylocation_keyunits_solddollars_soldavg_salesMeasur

26、esitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch01三三月2020Data Warehousing andOLAP Technology20Example of SnowflakeSchematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSalesFact Tabletime_keyitem_keybranch_keylocation_keyunits_solddo

27、llars_soldavg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycityprovince_or_streetcountrycity01三三月2020Data Warehousing andOLAP Technology21Example of FactConstellationtime_keydayday_of_the_weekmonthquarteryeartim

28、elocation_keystreetcityprovince_or_streetcountrylocationSalesFact Tabletime_keyitem_keybranch_keylocation_keyunits_solddollars_soldavg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranchShippingFact Tabletime_keyitem_keyshipper_keyfrom_locationto_locationdo

29、llars_costunits_shippedshipper_keyshipper_namelocation_keyshipper_typeshipper01三三月2020Data Warehousing andOLAP Technology22A DataMining Query Language,DMQL:LanguagePrimitivesCube Definition(FactTable)definecube:DimensionDefinition( DimensionTable)definedimensionas()Special Case(SharedDimensionTables

30、)Firsttime as “cube definition”definedimensionasincube01三三月2020Data Warehousing andOLAP Technology23Defininga StarSchema in DMQLdefinecubesales_star time,item,branch, location:dollars_sold=sum(sales_in_dollars), avg_sales= avg(sales_in_dollars),units_sold= count(*)definedimensiontimeas(time_key, day

31、,day_of_week, month,quarter,year)definedimensionitemas(item_key, item_name,brand, type, supplier_type)definedimensionbranchas(branch_key,branch_name, branch_type)definedimensionlocationas(location_key,street, city, province_or_state,country)01三三月2020Data Warehousing andOLAP Technology24Defininga Sno

32、wflakeSchemainDMQLdefinecubesales_snowflaketime, item, branch,location:dollars_sold=sum(sales_in_dollars), avg_sales= avg(sales_in_dollars),units_sold= count(*)definedimensiontimeas(time_key, day,day_of_week, month,quarter,year)definedimensionitemas(item_key, item_name,brand, type,supplier(supplier_

33、key, supplier_type)definedimensionbranchas(branch_key,branch_name, branch_type)definedimensionlocationas(location_key,street,city(city_key,province_or_state,country)01三三月2020Data Warehousing andOLAP Technology25Defininga FactConstellationinDMQLdefinecubesalestime,item,branch,location:dollars_sold=su

34、m(sales_in_dollars), avg_sales= avg(sales_in_dollars),units_sold= count(*)definedimensiontimeas(time_key, day,day_of_week, month,quarter,year)definedimensionitemas(item_key, item_name,brand, type, supplier_type)definedimensionbranchas(branch_key,branch_name, branch_type)definedimensionlocationas(loc

35、ation_key,street, city, province_or_state,country)definecubeshippingtime,item,shipper,from_location,to_location:dollar_cost= sum(cost_in_dollars),unit_shipped=count(*)definedimensiontimeastimeincubesalesdefinedimensionitemasitemincubesalesdefinedimensionshipperas(shipper_key, shipper_name,locationas

36、locationincubesales,shipper_type)definedimensionfrom_locationaslocationincubesalesdefinedimensionto_locationaslocationincubesales01三三月2020Data Warehousing andOLAP Technology26Measures:ThreeCategoriesdistributive: if theresultderivedbyapplyingthefunctiontonaggregatevalues is thesame as thatderivedbya

37、pplyingthefunctiononall thedata withoutpartitioning.E.g.,count(),sum(),min(), max().algebraic:ifitcanbecomputedbyanalgebraic function withMarguments(whereMisa boundedinteger), eachofwhichisobtainedbyapplyingadistributiveaggregate function.E.g.,avg(),min_N(),standard_deviation().holistic:ifthereisnoc

38、onstantboundonthestoragesize neededtodescribea subaggregate.E.g.,median(),mode(),rank().01三三月2020Data Warehousing andOLAP Technology27A ConceptHierarchy:Dimension(location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM.WindL.Chan.allregionofficecountryTorontoFrankfurtcity01三三月2020Data Ware

39、housing andOLAP Technology28View of Warehousesand HierarchiesSpecification of hierarchiesSchemahierarchydaymonthquarter;week yearSet_groupinghierarchy1.10 inexpensive01三三月2020Data Warehousing andOLAP Technology29Multidimensional DataSalesvolumeasafunctionofproduct,month,and regionProductRegionMonthD

40、imensions:Product,Location,TimeHierarchicalsummarizationpathsIndustryRegionYearCategoryCountryQuarterProductCityMonthWeekOfficeDay01三三月2020Data Warehousing andOLAP Technology30A SampleData CubeTotalannualsalesofTVinU.S.A.DateProductCountryAll, All, Allsumsum TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicos

41、um01三三月2020Data Warehousing andOLAP Technology31Cuboids Correspondingtothe Cubeallproductdatecountryproduct,dateproduct,countrydate,countryproduct,date,country0-D(apex) cuboid1-D cuboids2-D cuboids3-D(base) cuboid01三三月2020Data Warehousing andOLAP Technology32Browsinga DataCubeVisualizationOLAP capab

42、ilitiesInteractivemanipulation01三三月2020Data Warehousing andOLAP Technology33Typical OLAPOperationsRoll up (drill-up):summarizedatabyclimbinguphierarchyorbydimension reductionDrilldown (roll down):reverse of roll-upfrom higherlevelsummary to lower level summaryordetaileddata,orintroducingnew dimensio

43、nsSliceanddice:project andselectPivot(rotate):reorientthecube,visualization,3Dtoseries of 2D planes.Otheroperationsdrillacross:involving(across)morethan onefact tabledrillthrough:through thebottomlevelofthe cubetoits back-end relationaltables (usingSQL)01三三月2020Data Warehousing andOLAP Technology34A

44、 Star-Net Query Model ShippingMethodAIR-EXPRESSTRUCKORDERCustomerOrdersCONTRACTSCustomerProductPRODUCT GROUPPRODUCT LINEPRODUCT ITEMSALESPERSONDISTRICTDIVISIONOrganizationPromotionCITYCOUNTRYREGIONLocationDAILYQTRLYANNUALYTimeEach circleiscalledafootprint01三三月2020Data Warehousing andOLAP Technology3

45、5Data Warehousing andOLAP Technologyfor DataMiningWhat is adatawarehouse?A multi-dimensional datamodelData warehousearchitectureData warehouseimplementationFurther development of datacubetechnologyFrom datawarehousingtodatamining01三三月2020Data Warehousing andOLAP Technology36DesignofaData Warehouse:A

46、BusinessAnalysisFrameworkFour views regardingthedesign of adatawarehouseTop-downviewallowsselection of therelevantinformationnecessaryfor thedata warehouseData sourceviewexposes theinformationbeingcaptured,stored,andmanagedbyoperationalsystemsData warehouseviewconsistsoffact tablesanddimension table

47、sBusinessqueryviewsees theperspectivesofdatainthewarehouse fromthe viewofend-user01三三月2020Data Warehousing andOLAP Technology37Data WarehouseDesignProcessTop-down,bottom-up approachesoracombinationofbothTop-down: Startswith overalldesign andplanning(mature)Bottom-up: Startswith experiments andprotot

48、ypes (rapid)From software engineering point of viewWaterfall:structured andsystematic analysis at eachstepbeforeproceedingtothenextSpiral:rapidgeneration of increasinglyfunctional systems, short turnaround time, quick turnaroundTypical datawarehouse designprocessChooseabusinessprocesstomodel,e.g.,or

49、ders,invoices,etc.Choosethegrain(atomiclevelofdata)ofthebusinessprocessChoosethedimensionsthat willapplytoeachfact table recordChoosethemeasurethat willpopulateeachfact table record01三三月2020Data Warehousing andOLAP Technology38Multi-TieredArchitectureDataWarehouseExtractTransformLoadRefreshOLAP Engi

50、neAnalysisQueryReportsData miningMonitor&IntegratorMetadataData SourcesFront-EndToolsServeData MartsOperational DBsothersourcesData StorageOLAP Server01三三月2020Data Warehousing andOLAP Technology39SourceDatabasesData Extraction,Transformation, loadWarehouseAdmin.ToolsExtract, Transformand LoadDataMod

51、elingToolCentralMetadataArchitectedData MartsData Accessand AnalysisEnd-UserDW ToolsCentral DataWarehouseCentral DataWarehouseMid-TierMid-TierDataMartDataMartLocal MetadataLocal MetadataLocal MetadataMetadataExchangeMDBDataCleansingToolRelationalAppl. PackageLegacyExternalRDBMSRDBMS体系结构构Pieter,1998数

52、据仓库库的焦点点问题-数据的获获得、存存储和使使用RelationalPackageLegacyExternalsourceDataCleanToolDataStagingEnterpriseDataWarehouseDatamartDatamartRDBMSROLAPRDBMSEnd-UserToolEnd-UserToolMDBEnd-UserToolEnd-UserTool数据仓库库和集市市的加载载能力至至关重要要数据仓库库和集市市的查询询输出能能力至关关重要ETL工具去掉操作作型数据据库中的的不需要要的数据据统一转换换数据的的名称和和定义计算汇总总数据和和派生数数据估计遗失失数据的的缺

53、省值值调节源数数据的定定义变化化01三三月2020Data Warehousing andOLAP Technology42ThreeData WarehouseModelsEnterprise warehousecollectsallofthe information about subjects spanning theentireorganizationData Marta subsetofcorporate-widedata thatisofvaluetoaspecificgroupsofusers.Itsscopeisconfinedtospecific, selected grou

54、ps,suchasmarketingdatamartIndependentvs.dependent (directlyfrom warehouse)datamartVirtual warehouseA setofviewsover operational databasesOnly someofthe possible summaryviewsmay be materialized01三三月2020Data Warehousing andOLAP Technology43Data WarehouseDevelopment:ARecommendedApproachDefineahigh-leve

55、l corporatedata modelData MartData MartDistributedData MartsMulti-Tier DataWarehouseEnterprise DataWarehouseModelrefinementModelrefinement01三三月2020Data Warehousing andOLAP Technology44OLAP ServerArchitecturesRelational OLAP(ROLAP)Userelationalorextended-relational DBMStostoreand managewarehousedataa

56、ndOLAPmiddlewaretosupport missingpiecesInclude optimizationofDBMS backend, implementation of aggregation navigationlogic, andadditional tools andservicesgreater scalabilityMultidimensional OLAP(MOLAP)Array-basedmultidimensional storageengine (sparsematrix techniques)fast indexing to pre-computedsumm

57、arized dataHybridOLAP(HOLAP)User flexibility,e.g.,low level:relational,high-level:arraySpecializedSQLserversspecializedsupport forSQLqueriesover star/snowflake schemas01三三月2020Data Warehousing andOLAP Technology45Data Warehousing andOLAP Technologyfor DataMiningWhat is adatawarehouse?A multi-dimensi

58、onal datamodelData warehousearchitectureData warehouseimplementationFurther development of datacubetechnologyFrom datawarehousingtodatamining01三三月2020Data Warehousing andOLAP Technology46EfficientDataCube ComputationData cubecan be viewedasa latticeofcuboidsThebottom-mostcuboid is thebase cuboidThet

59、op-mostcuboid (apex)containsonly onecellHowmanycuboids in an n-dimensionalcubewith Llevels?MaterializationofdatacubeMaterializeevery(cuboid)(fullmaterialization),none(nomaterialization),orsome (partial materialization)SelectionofwhichcuboidstomaterializeBasedonsize,sharing,accessfrequency,etc.01三三月2

60、020Data Warehousing andOLAP Technology47Cube OperationCube definitionand computation in DMQLdefinecubesalesitem,city,year:sum(sales_in_dollars)compute cubesalesTransformitintoa SQL-like language (with anew operatorcube by, introducedbyGrayetal.96)SELECTitem,city,year,SUM (amount)FROM SALESCUBE BYite

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论