版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
High-DimensionalOLAP:
AMinimalCubingApproachpurposeHowtocubinginHigh-DimensinaldatawarehousesefficientlyThispaperproposeanovelmethodthatcomputesathinlayerofthedatacubetogetherwithassociatedvalue-listindicesIntroductionDatacubehasbeenplayinganessentialroleintheimplementationoffastOLAPoperationTherehavebeenmanyefficientcubecomputationalgorithmsproposedMultiwayarrayaggregationBUCH-cubingStar-cubingIntroduction(cont.)Traditionaldatawarehousemayhave10dimensions,butmorethat109
tuplesButforbioinformatics,textprocessing,dataarehighindimensionality,over100,1000dimensionsbutonlymediuminsize,egaround106
tuples.ExistingmethodistoocostlyincomputationtimeandstoragespacetohighdimensionalOLAPIntroduction(cont.)newmethodcalledshellfragmentVerticallypartitionsahighdimensionaldatasetintoasetofdisjointlowdimensionaldatasetsForeachfragment,computeitlocaldatacubeofflineWhenquery,assemblethesefragmentonlineAnalysisCurseofDimensionalityAhighdimensionaldatacuberequiresmassivememoryanddiskspaceCurrentalgorithmsareunabletomaterializethefullcubeundersuchconditionsIcebergCubeComputingonlythecuboidcellswhosecountorotheraggregatessatisfyingthecondition:HAVINGCOUNT(*)>=minsupMotivationOnlyasmallportionofcubecellsmaybe“abovethewater’’inasparsecubeOnlycalculate“interesting”data—dataabovecertainthresholdProblemofIcebergCubeFirst,ifahigh-dimensionalcellhasthesupportalreadypassingthecebergthreshold,itcannotbeprunedbytheicebergconditionandwillstillgenerateahugenumberofcells.abasecuboidcell:“(a1;a2;:::;a60):5"(i.e.,withcount5)willstillgenerate260icebergcubecells.ProblemofIcebergCube(cont.)Second,itisdifficulttosetupanappropriateicebergthreshold.Atoolowthresholdwillstillgenerateahugecube,butatoohighonemayinvalidatemanyusefulapplications.Third,anicebergcubecannotbeincrementallyup-dated.Samesituationhappensinthedwarf,quotientcubeSubstantialI/OoverheadforaccessingafullmaterializeddatacubeQueryordermightbeincompatiblewithaI/OproblemCuboidsarestoredondiskinsomefixedorder,thatordermightbeincompatiblewithparticularequery.CurrentpartialsolutionComputeathincubeshellCubeidwithMaybe3dimensionsorlessina60Existingalotofproblems:StillneedtocomputealotofcubeidDonotsupportOLAPover4dimensionsCannotsupportdrillingComputationModelSemi-onlinecomputatinmodelwithcertainpre-processingObservation,anOLAPquery: ignoremanydimensions(i.e.,treatingthemasirrelevant)fixsomedimensions(e.g.,usingqueryconstantsasinstantiations)leaveonlyafewtobemanipulated(fordrilling,pivoting,etc.).OLAPoperationsPrecomputationofshellFragmentsInvertedIndexLemma1TheinvertedindextableusesthesameamountofstoragespaceastheoriginaldatabaseShellFragmentsAllthedimensionsofadatasetarepartitionedintoindependentgroups,calledfragments.Foreachfragment,wecomputethecompletelocaldatacubewhileretainingtheinvertedindices.(A1……A60),fragmentsofsize3,140cubeids,whilecubeshellofsizeof336050cubeids.Example(A,B,C)and(D,E)Foreachfragment,wecomputethecompletedatacubebyintersectingthetid-lists{a1b2*}CuboidDELemma2GivenadatabaseofTtuplesandDdimensions,theamountofmemoryneededtostoretheshellfragmentsofsizeFisO(T(D/F)(2F-1))ComputingotherMeasuresSum,averageID_MeasurearrayAlgorithmforShellFragmentComputationOnlineQueryComputationPointQueryseeksaspecialcuboidcellintheoriginaldataspace.Inann-dimensionaldatacube(A1;A2;:::;An),apointqueryisintheformof(a1;a2;:::;an:M)MistheinquiredmeasureFordimensionsthatareirrelevantoraggregated,onecanuse*asitsvalue.SubcubeQueryseeksasetofcuboidcellsintheoriginaldataspaceItisonewhereatleastoneoftherelevantdimensionsinthequeryisinquired,Marked?.<a2;?;c1;*;?:count()>QueryProcessing<a1;a2;:::;an:M>.Eachaihas3possiblevalues:aninstantiatedvalue,Aggregate*,inquire?.Stepsforinstantiateddimensionalgatheralltheinstantiatedai'sifthereareanyexaminetheshellfragmentpartitionstocheckwhichai'sareinthesamefragments.retrievethetid-listsTheobtainedtid-listsareintersectedtoderivetheinstantiatedbasetable.Iftherearenoinquireddimensions,stopotherwiseStepsforinquireddimensionsForeachinquireddimension,weretrieveallitspossiblevaluesandtheirassociatedtid-lists.theyareintersectedwiththeinstantiatedbasetabletoformthelocalbasecuboidoftheinquiredandinstantiateddimensions.AnycubingalgorithmcanbeemployedtocomputethelocaldatacubeShellFragmentGrouping&SizeGroupingdomain-specificknowledgecanbeusedforbettergrouping.Size(F)IfFistoosmall,thespacerequiredtostorethefragmentcubeswillbesmallbutthetimeneededtocomputequeriesonlinewillbelong.2<=F<=4Bottom-UpComputation(BUC)BUC(Beyer&Ramakrishnan,SIGMOD’99)Bottom-upvs.top-down?—dependingonhowyouviewit!Aprioriproperty:Aggregatethedata, thenmovetothenextlevelIfminsupisnotmet,stop!Ifminsup=1ÞcomputefullCUBE!PartitioningUsually,entiredatasetcan’tfitinmainmemorySortdistinctvalues,partitionintoblocksthatfitContinueprocessingOptimizationsPartiti
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 绿色科技风网络安全培训模板
- 艾滋病防治药品管理及临床应用课件
- 专利转让协议书范本(11篇)
- 广西农村信用社招聘公共基础知识(常识)模拟试卷6(题后含答案及解析)
- 人工智能导论知到智慧树章节测试课后答案2024年秋东北石油大学
- 临聘人员转长期合同申请
- 粮情测控合同
- 房租租赁合同能上学吗
- 房屋租赁合同签订的申请报告
- 《实践型教学模式》课件
- 安徽省合肥市包河区2023-2024学年三年级上学期语文期末试卷
- 2024-2025学年二年级数学上册期末乐考非纸笔测试题(二 )(苏教版)
- 2024年度智能制造生产线改造项目合同
- DB32T 4578.2-2023 丙型病毒性肝炎防治技术指南 第2部分:患者管理
- 广东省茂名市崇文学校2023-2024学年九年级上学期期末英语试卷(无答案)
- 眼科专科题库+答案
- 智能化安装合同补充协议
- 英语期末复习讲座模板
- 京东管理培训生
- 北京市西城区2023-2024学年六年级上学期语文期末试卷
- 市政苗木移植合同范例
评论
0/150
提交评论