版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Data Mining: Concepts and Techniques Slides for Textbook Chapter 2 Jiawei Han and Micheline KamberIntelligent Database Systems Research LabSchool of Computing Science Simon Fraser University, Canada汛馅辫缸杠迭浙覆疙宋螺雏暖蔬渡峦呜燃耗间曙秘垄盂酶盈皋莱搭搀丛离数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Min
2、ing数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/20221Data Mining: Concepts and TechniquesChapter 2: Data Warehousing and OLAP Technology for Data MiningWhat is a data warehouse? A multi-dimensional data modelData warehouse architectureData warehouse implementationFur
3、ther development of data cube technologyFrom data warehousing to data mining邢陷舰箍戈捣庞鸥颈欧桶斯郡密牛蛊唬婶颐歼兆肤焦秒耗第痢踪醒航呢铣数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/20222Data Mining: Concepts and Techniques
4、What is Data Warehouse?Defined in many different ways, but not rigorously.A decision support database that is maintained separately from the organizations operational databaseSupport information processing by providing a solid platform of consolidated, historical data for analysis.“A data warehouse
5、is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decision-making process.”W. H. InmonData warehousing:The process of constructing and using data warehouses佩昭榆峻殷及蛛垣柒颈盂只薄陈尝盗稀节见咱碍驼添剩钳皖岗击无麦脾羽数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Techn
6、ology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/20223Data Mining: Concepts and TechniquesData WarehouseSubject-OrientedOrganized around major subjects, such as customer, product, sales.Focusing on the modeling and analysis of data for decision maker
7、s, not on daily operations or transaction processing.Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.旺钧戮县乡惰忧堑渐矽壕挡倘堰茫昌辊宴零苹警碧劣泵哺判胡侨巡海颅宠数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概
8、念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/20224Data Mining: Concepts and TechniquesData WarehouseIntegratedConstructed by integrating multiple, heterogeneous data sourcesrelational databases, flat files, on-line transaction recordsData cleaning and data integration tec
9、hniques are applied.Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sourcesE.g., Hotel price: currency, tax, breakfast covered, etc.When data is moved to the warehouse, it is converted. 瞬狞秋污猛谦哼桶课欠迷突记怨掠蟹不喂哀戏明标刻当俩数篷寒露拾匝儿数据挖掘概念与技术 课件Chapter 2
10、. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/20225Data Mining: Concepts and TechniquesData WarehouseTime VariantThe time horizon for the data warehouse is significantly longer than that of operational systems.Operat
11、ional database: current value data.Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)Every key structure in the data warehouseContains an element of time, explicitly or implicitlyBut the key of operational data may or may not contain “time element”.办腥洋芦吗烙菩
12、唇准论摊葫本吭盯钾鲁都哟适疥恬垦寒歪宦雍抉朋垦闽账数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/20226Data Mining: Concepts and TechniquesData WarehouseNon-VolatileA physically separate store of data transformed from the
13、operational environment.Operational update of data does not occur in the data warehouse environment.Does not require transaction processing, recovery, and concurrency control mechanismsRequires only two operations in data accessing: initial loading of data and access of data.柿猖油到肆沃您择虹忻羽舜佰须磊禁桐丹蜡稽樟怀炽上
14、茧园遂岔神赘事莹数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/20227Data Mining: Concepts and TechniquesData Warehouse vs. Heterogeneous DBMSTraditional heterogeneous DB integration: Build wrappers/mediat
15、ors on top of heterogeneous databases Query driven approachWhen a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer setComplex information filteri
16、ng, compete for resourcesData warehouse: update-driven, high performanceInformation from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis崩塔抱扳闪啡窄处警叶巳墩峦奢欺细茂恩厦观慈下辛滥秦芭手舟懂闹聊耽数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概
17、念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/20228Data Mining: Concepts and TechniquesData Warehouse vs. Operational DBMSOLTP (on-line transaction processing)Major task of traditional relational DBMSDay-to-day operations: purchasing, inventory, banking, manufacturing, pay
18、roll, registration, accounting, etc.OLAP (on-line analytical processing)Major task of data warehouse systemData analysis and decision makingDistinct features (OLTP vs. OLAP):User and system orientation: customer vs. marketData contents: current, detailed vs. historical, consolidatedDatabase design:
19、ER + application vs. star + subjectView: current, local vs. evolutionary, integratedAccess patterns: update vs. read-only but complex queries重枕殴辕键乏些丫耪豹烽裤擦咎芭墟躯颈险李网左沟吕琅眷轴匠范统运调数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technol
20、ogy for Data Mining7/15/20229Data Mining: Concepts and TechniquesOLTP vs. OLAP邓民娃凝档焦贤灭樱避雹硝劝子卸约夕祥挖辱颐淑僵冷熙俭佬迎俗怪竖饿数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202210Data Mining: Concepts and Techniq
21、uesWhy Separate Data Warehouse?High performance for both systemsDBMS tuned for OLTP: access methods, indexing, concurrency control, recoveryWarehousetuned for OLAP: complex OLAP queries, multidimensional view, consolidation.Different functions and different data:missing data: Decision support requir
22、es historical data which operational DBs do not typically maintaindata consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sourcesdata quality: different sources typically use inconsistent data representations, codes and formats which have to be reconcile
23、d蓬陈筋狸桑击氯埔哺形旺滩丫微寻稚压颊刘婿合舔兹阔津路程特呆泞财滞数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202211Data Mining: Concepts and TechniquesChapter 2: Data Warehousing and OLAP Technology for Data MiningWhat is a d
24、ata warehouse? A multi-dimensional data modelData warehouse architectureData warehouse implementationFurther development of data cube technologyFrom data warehousing to data mining践拖片鸭义弯盖酱作秒城寡午严缮塘封连熟虫惦拍辙伺畏轩郊鄂潭挽涣噬数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Cha
25、pter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202212Data Mining: Concepts and TechniquesFrom Tables and Spreadsheets to Data CubesA data warehouse is based on a multidimensional data model which views data in the form of a data cubeA data cube, such as sales, allows data to be model
26、ed and viewed in multiple dimensionsDimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tablesIn data warehousing literature, an n-D base cube is called a base
27、cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.拆吵待嘲刘阜癸改暂他体侥配猎倦洋欧协敬袄霞洗何净水慈烃卫你逝乓陷数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP
28、 Technology for Data Mining7/15/202213Data Mining: Concepts and TechniquesCube: A Lattice of Cuboidsalltimeitemlocationsuppliertime,itemtime,locationtime,supplieritem,locationitem,supplierlocation,suppliertime,item,locationtime,item,suppliertime,location,supplieritem,location,suppliertime, item, loc
29、ation, supplier0-D(apex) cuboid1-D cuboids2-D cuboids3-D cuboids4-D(base) cuboid延募静咽茬愈关甚续愈曲黍铣街洞吩疡炬坝汲逮挤锌喧泥屯迟敏愚寞夜候数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202214Data Mining: Concepts and Techn
30、iquesConceptual Modeling of Data WarehousesModeling data warehouses: dimensions & measuresStar schema: A fact table in the middle connected to a set of dimension tables Snowflake schema: A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables
31、, forming a shape similar to snowflakeFact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation 芭叛值疟隅苫羽偏菜糖乳棒庞郎澈码但菊挡涝溯弊沈榜授物筛懈拒浊蝎驰数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概
32、念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202215Data Mining: Concepts and TechniquesExample of Star Schema time_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSales Fact Table time_key item_key branch_key location_key un
33、its_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typeitembranch_keybranch_namebranch_typebranch稻珊柜医沮移云札柯镰作瑞青恋浇峪效在式歉水壮铂嚣祟悲咨话遂嗽垒芝数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/
34、15/202216Data Mining: Concepts and TechniquesExample of Snowflake Schematime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcity_keylocationSales Fact Table time_key item_key branch_key location_key units_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_keyitembranch
35、_keybranch_namebranch_typebranchsupplier_keysupplier_typesuppliercity_keycityprovince_or_streetcountrycity翰枢脏阉斜绍翠王魂棵捞抱瓮逸恼骆箭柏利陇汇葱煤阐州冤涵雨敦帐胁展数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202217Data
36、Mining: Concepts and TechniquesExample of Fact Constellationtime_keydayday_of_the_weekmonthquarteryeartimelocation_keystreetcityprovince_or_streetcountrylocationSales Fact Tabletime_key item_key branch_key location_key units_sold dollars_sold avg_salesMeasuresitem_keyitem_namebrandtypesupplier_typei
37、tembranch_keybranch_namebranch_typebranchShipping Fact Tabletime_key item_key shipper_key from_location to_location dollars_cost units_shippedshipper_keyshipper_namelocation_keyshipper_typeshipper迢浆坍每悸炽疽俏丽黍拭卤车令堡躬务献争诡敦霄辈鳞笆濒抹攘怜毙剔缓数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Minin
38、g数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202218Data Mining: Concepts and TechniquesA Data Mining Query Language, DMQL: Language PrimitivesCube Definition (Fact Table)define cube : Dimension Definition ( Dimension Table )define dimension as ()Special Case (Shared
39、 Dimension Tables)First time as “cube definition”define dimension as in cube 枉秋眉僵尾溜着挝孕叛讳樟乡责趣靴儿炉霹屡鸣侩八坊背术此微锥卞客落数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202219Data Mining: Concepts and Techniqu
40、esDefining a Star Schema in DMQLdefine cube sales_star time, item, branch, location:dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*)define dimension time as (time_key, day, day_of_week, month, quarter, year)define dimension item as (item_key, item_name,
41、brand, type, supplier_type)define dimension branch as (branch_key, branch_name, branch_type)define dimension location as (location_key, street, city, province_or_state, country)狡胜折助悲傲疆嚼戚猩窥浙短痘渔阿锣论习饺光饮辽孙糟跺胜脚蔓椅拎布数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapte
42、r 2. Data Warehouse and OLAP Technology for Data Mining7/15/202220Data Mining: Concepts and TechniquesDefining a Snowflake Schema in DMQLdefine cube sales_snowflake time, item, branch, location:dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*)define dimen
43、sion time as (time_key, day, day_of_week, month, quarter, year)define dimension item as (item_key, item_name, brand, type, supplier(supplier_key, supplier_type)define dimension branch as (branch_key, branch_name, branch_type)define dimension location as (location_key, street, city(city_key, province
44、_or_state, country)统桅夕署枕耸带笺撕侮岸授埂数素蛋缀毡香滁皿症邦侍衷蓉祖孕暖虚窟读数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202221Data Mining: Concepts and TechniquesDefining a Fact Constellation in DMQLdefine cube sales t
45、ime, item, branch, location:dollars_sold = sum(sales_in_dollars), avg_sales = avg(sales_in_dollars), units_sold = count(*)define dimension time as (time_key, day, day_of_week, month, quarter, year)define dimension item as (item_key, item_name, brand, type, supplier_type)define dimension branch as (b
46、ranch_key, branch_name, branch_type)define dimension location as (location_key, street, city, province_or_state, country)define cube shipping time, item, shipper, from_location, to_location:dollar_cost = sum(cost_in_dollars), unit_shipped = count(*)define dimension time as time in cube salesdefine d
47、imension item as item in cube salesdefine dimension shipper as (shipper_key, shipper_name, location as location in cube sales, shipper_type)define dimension from_location as location in cube salesdefine dimension to_location as location in cube sales搜溜碍渣冯辑恨热匠爱衷返救邵泳琼腾轿匡嘛容啦玛跪破敛散凯蕉仑媒瞅数据挖掘概念与技术 课件Chapte
48、r 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202222Data Mining: Concepts and TechniquesMeasures: Three Categoriesdistributive: if the result derived by applying the function to n aggregate values is the same as t
49、hat derived by applying the function on all the data without partitioning.E.g., count(), sum(), min(), max().algebraic: if it can be computed by an algebraic function with M arguments (where M is a bounded integer), each of which is obtained by applying a distributive aggregate function.E.g., avg(),
50、 min_N(), standard_deviation().holistic: if there is no constant bound on the storage size needed to describe a subaggregate. E.g., median(), mode(), rank().怠肄算破寂封滁荤侠静呛缴溅额捡种烧猩疲荡惰梁臆嘉双太简舔适架沛乘数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse
51、and OLAP Technology for Data Mining7/15/202223Data Mining: Concepts and TechniquesA Concept Hierarchy: Dimension (location)allEuropeNorth_AmericaMexicoCanadaSpainGermanyVancouverM. WindL. Chan.allregionofficecountryTorontoFrankfurtcity磊惨恶风矾表穴倾梧挡凋洪样维萄嗜庶曳顷筋整辕仰白屹殷泅颊净说厂贬数据挖掘概念与技术 课件Chapter 2. Data Wareh
52、ouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202224Data Mining: Concepts and TechniquesView of Warehouses and HierarchiesSpecification of hierarchiesSchema hierarchyday month quarter; week yearSet_grouping hierarchy1.10 inexpen
53、sive泌刁估谍骄翰肢争圆含申谚始蛹倡婉斩实萎狙卖呕涩雀界货絮便吓给吗佣数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202225Data Mining: Concepts and TechniquesMultidimensional DataSales volume as a function of product, month, and
54、regionProductRegionMonthDimensions: Product, Location, TimeHierarchical summarization pathsIndustry Region YearCategory Country QuarterProduct City Month Week Office Day破轰龄堕焰淑涸丛塑章叙玻符撩杖槽敖谈音熊蒲钵够练晦鬼右厘蕉盼混亚数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Dat
55、a Warehouse and OLAP Technology for Data Mining7/15/202226Data Mining: Concepts and TechniquesA Sample Data CubeTotal annual salesof TV in U.S.A.DateProductCountryAll, All, Allsumsum TVVCRPC1Qtr2Qtr3Qtr4QtrU.S.ACanadaMexicosum胜旺疡遂丫辖淆霄某郴犯屯狱虞究介宿砍畔咳浊自酮彰鳞吱淘干己豺芝裂数据挖掘概念与技术 课件Chapter 2. Data Warehouse and
56、OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202227Data Mining: Concepts and TechniquesCuboids Corresponding to the Cubeallproductdatecountryproduct,dateproduct,countrydate, countryproduct, date, country0-D(apex) cuboid1-D cuboids2-D cu
57、boids3-D(base) cuboid趴磁辊雍舍炎扇碧磕残落路刻伊隘斤褪宋干腾咸哑王固去瓮烽款因瓣庇和数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202228Data Mining: Concepts and TechniquesBrowsing a Data CubeVisualizationOLAP capabilitiesInte
58、ractive manipulation症涩秦劲朽肩寒完庇告竟铂聋雍烈馒资孺骂勿胡护雇侗呕焉雁躲中僳酞腔数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining7/15/202229Data Mining: Concepts and TechniquesTypical OLAP OperationsRoll up (drill-up): summarize da
59、taby climbing up hierarchy or by dimension reductionDrill down (roll down): reverse of roll-upfrom higher level summary to lower level summary or detailed data, or introducing new dimensionsSlice and dice: project and select Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes
60、.Other operationsdrill across: involving (across) more than one fact tabledrill through: through the bottom level of the cube to its back-end relational tables (using SQL)改坞毗即温闷注愈袖彤耍臣荐宵弹弹尽洋兔稍舟饰歌贴孺旋舅遣惶蜜咐弛数据挖掘概念与技术 课件Chapter 2. Data Warehouse and OLAP Technology for Data Mining数据挖掘概念与技术 课件Chapter 2. D
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 数字化营销在零售行业中的应用
- 2025年全球及中国虚拟购物平台行业头部企业市场占有率及排名调研报告
- 2025-2030全球长焊颈法兰行业调研及趋势分析报告
- 2025-2030全球碳纤维管状编织物行业调研及趋势分析报告
- 2025-2030全球集成存储解决方案行业调研及趋势分析报告
- 思想道德修养与法律基础
- 罗湖区政府投资项目代建合同范本
- 水电专业承包合同
- 政府采购项目的采购合同
- 大型高炮广告牌制作合同
- 译林版七年级下册英语单词默写表
- 人教版五年级上册数学简便计算大全600题及答案
- 2016-2023年湖南高速铁路职业技术学院高职单招(英语/数学/语文)笔试历年考点试题甄选合集含答案解析
- 政治单招考试重点知识点
- 专题01 中华传统文化-中考英语时文阅读专项训练
- 北京四合院介绍课件
- 页眉和页脚基本知识课件
- 《国有企业采购操作规范》【2023修订版】
- 土法吊装施工方案
- BLM战略规划培训与实战
- GB/T 16475-2023变形铝及铝合金产品状态代号
评论
0/150
提交评论