




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
High-DimensionalOLAP:
AMinimalCubingApproachpurposeHowtocubinginHigh-DimensinaldatawarehousesefficientlyThispaperproposeanovelmethodthatcomputesathinlayerofthedatacubetogetherwithassociatedvalue-listindicesIntroductionDatacubehasbeenplayinganessentialroleintheimplementationoffastOLAPoperationTherehavebeenmanyefficientcubecomputationalgorithmsproposedMultiwayarrayaggregationBUCH-cubingStar-cubingIntroduction(cont.)Traditionaldatawarehousemayhave10dimensions,butmorethat109
tuplesButforbioinformatics,textprocessing,dataarehighindimensionality,over100,1000dimensionsbutonlymediuminsize,egaround106
tuples.ExistingmethodistoocostlyincomputationtimeandstoragespacetohighdimensionalOLAPIntroduction(cont.)newmethodcalledshellfragmentVerticallypartitionsahighdimensionaldatasetintoasetofdisjointlowdimensionaldatasetsForeachfragment,computeitlocaldatacubeofflineWhenquery,assemblethesefragmentonlineAnalysisCurseofDimensionalityAhighdimensionaldatacuberequiresmassivememoryanddiskspaceCurrentalgorithmsareunabletomaterializethefullcubeundersuchconditionsIcebergCubeComputingonlythecuboidcellswhosecountorotheraggregatessatisfyingthecondition:HAVINGCOUNT(*)>=minsupMotivationOnlyasmallportionofcubecellsmaybe“abovethewater’’inasparsecubeOnlycalculate“interesting”data—dataabovecertainthresholdProblemofIcebergCubeFirst,ifahigh-dimensionalcellhasthesupportalreadypassingthecebergthreshold,itcannotbeprunedbytheicebergconditionandwillstillgenerateahugenumberofcells.abasecuboidcell:“(a1;a2;:::;a60):5"(i.e.,withcount5)willstillgenerate260icebergcubecells.ProblemofIcebergCube(cont.)Second,itisdifficulttosetupanappropriateicebergthreshold.Atoolowthresholdwillstillgenerateahugecube,butatoohighonemayinvalidatemanyusefulapplications.Third,anicebergcubecannotbeincrementallyup-dated.Samesituationhappensinthedwarf,quotientcubeSubstantialI/OoverheadforaccessingafullmaterializeddatacubeQueryordermightbeincompatiblewithaI/OproblemCuboidsarestoredondiskinsomefixedorder,thatordermightbeincompatiblewithparticularequery.CurrentpartialsolutionComputeathincubeshellCubeidwithMaybe3dimensionsorlessina60Existingalotofproblems:StillneedtocomputealotofcubeidDonotsupportOLAPover4dimensionsCannotsupportdrillingComputationModelSemi-onlinecomputatinmodelwithcertainpre-processingObservation,anOLAPquery: ignoremanydimensions(i.e.,treatingthemasirrelevant)fixsomedimensions(e.g.,usingqueryconstantsasinstantiations)leaveonlyafewtobemanipulated(fordrilling,pivoting,etc.).OLAPoperationsPrecomputationofshellFragmentsInvertedIndexLemma1TheinvertedindextableusesthesameamountofstoragespaceastheoriginaldatabaseShellFragmentsAllthedimensionsofadatasetarepartitionedintoindependentgroups,calledfragments.Foreachfragment,wecomputethecompletelocaldatacubewhileretainingtheinvertedindices.(A1……A60),fragmentsofsize3,140cubeids,whilecubeshellofsizeof336050cubeids.Example(A,B,C)and(D,E)Foreachfragment,wecomputethecompletedatacubebyintersectingthetid-lists{a1b2*}CuboidDELemma2GivenadatabaseofTtuplesandDdimensions,theamountofmemoryneededtostoretheshellfragmentsofsizeFisO(T(D/F)(2F-1))ComputingotherMeasuresSum,averageID_MeasurearrayAlgorithmforShellFragmentComputationOnlineQueryComputationPointQueryseeksaspecialcuboidcellintheoriginaldataspace.Inann-dimensionaldatacube(A1;A2;:::;An),apointqueryisintheformof(a1;a2;:::;an:M)MistheinquiredmeasureFordimensionsthatareirrelevantoraggregated,onecanuse*asitsvalue.SubcubeQueryseeksasetofcuboidcellsintheoriginaldataspaceItisonewhereatleastoneoftherelevantdimensionsinthequeryisinquired,Marked?.<a2;?;c1;*;?:count()>QueryProcessing<a1;a2;:::;an:M>.Eachaihas3possiblevalues:aninstantiatedvalue,Aggregate*,inquire?.Stepsforinstantiateddimensionalgatheralltheinstantiatedai'sifthereareanyexaminetheshellfragmentpartitionstocheckwhichai'sareinthesamefragments.retrievethetid-listsTheobtainedtid-listsareintersectedtoderivetheinstantiatedbasetable.Iftherearenoinquireddimensions,stopotherwiseStepsforinquireddimensionsForeachinquireddimension,weretrieveallitspossiblevaluesandtheirassociatedtid-lists.theyareintersectedwiththeinstantiatedbasetabletoformthelocalbasecuboidoftheinquiredandinstantiateddimensions.AnycubingalgorithmcanbeemployedtocomputethelocaldatacubeShellFragmentGrouping&SizeGroupingdomain-specificknowledgecanbeusedforbettergrouping.Size(F)IfFistoosmall,thespacerequiredtostorethefragmentcubeswillbesmallbutthetimeneededtocomputequeriesonlinewillbelong.2<=F<=4Bottom-UpComputation(BUC)BUC(Beyer&Ramakrishnan,SIGMOD’99)Bottom-upvs.top-down?—dependingonhowyouviewit!Aprioriproperty:Aggregatethedata, thenmovetothenextlevelIfminsupisnotmet,stop!Ifminsup=1ÞcomputefullCUBE!PartitioningUsually,entiredatasetcan’tfitinmainmemorySortdistinctvalues,partitionintoblocksthatfitContinueprocessingOptimizationsPartiti
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年中国汽车热系统行业市场调查研究及投资战略研究报告
- 2025年度企业并购合同解除及资产清算协议
- 2025年度人合伙经营管理农业项目合作协议书
- 2025年中国大拉簧行业市场发展前景及发展趋势与投资战略研究报告
- 2025年度个人养老产业投资合作协议
- 消费者信任在网络直播带货中的作用研究
- 2024年职业教育行业市场前景预测及投资方向研究报告
- 乐器行改造合同模版
- 2025年度房产抵押权抵押权合同
- 民宿装饰工程合同样本
- 民法典物权编详细解读课件
- 《推力和拉力》课件
- 西师版小学数学二年级(下)表格式全册教案
- 娱乐场所安全承诺声明
- 2025届广东省广州市番禺区数学高一下期末检测试题含解析
- 2024年镇江市高等专科学校单招职业适应性测试题库完美版
- 珠海市高级技工学校校企合作管理办法修订
- MOOC 量子信息原理与应用-南京大学 中国大学慕课答案
- 医保基金监管培训课件
- 参地益肾口服液作用机制研究
- 放射性药物运输与存储的安全性要求
评论
0/150
提交评论