ChapterMiningFrequentPatternsAssociationand优质获奖课件_第1页
ChapterMiningFrequentPatternsAssociationand优质获奖课件_第2页
ChapterMiningFrequentPatternsAssociationand优质获奖课件_第3页
ChapterMiningFrequentPatternsAssociationand优质获奖课件_第4页
ChapterMiningFrequentPatternsAssociationand优质获奖课件_第5页
已阅读5页,还剩36页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

03May2023DataMining:ConceptsandTechniques1Chapter5:MiningFrequentPatterns,AssociationandCorrelationsBasicconceptsandaroadmapEfficientandscalablefrequentitemsetminingmethodsConstraint-basedassociationminingSummary03May2023DataMining:ConceptsandTechniques2WhatIsFrequentPatternAnalysis?Frequentpattern:apattern(asetofitems,subsequences,substructures,etc.)thatoccursfrequentlyinadatasetFirstproposedbyAgrawal,Imielinski,andSwami[AIS93]inthecontextoffrequentitemsetsandassociationruleminingMotivation:FindinginherentregularitiesindataWhatproductswereoftenpurchasedtogether?—Beeranddiapers?!WhatarethesubsequentpurchasesafterbuyingaPC?WhatkindsofDNAaresensitivetothisnewdrug?Canweautomaticallyclassifywebdocuments?ApplicationsBasketdataanalysis,cross-marketing,catalogdesign,salecampaignanalysis,Weblog(clickstream)analysis,andDNAsequenceanalysis.03May2023DataMining:ConceptsandTechniques3WhyIsFreq.PatternMiningImportant?DisclosesanintrinsicandimportantpropertyofdatasetsFormsthefoundationformanyessentialdataminingtasksAssociation,correlation,andcausalityanalysisSequential,structural(e.g.,sub-graph)patternsPatternanalysisinspatiotemporal,multimedia,time-series,andstreamdataClassification:associativeclassificationClusteranalysis:frequentpattern-basedclusteringDatawarehousing:icebergcubeandcube-gradientSemanticdatacompression:fasciclesBroadapplications03May2023DataMining:ConceptsandTechniques4BasicConcepts:FrequentPatternsandAssociationRulesItemsetX={x1,…,xk}FindalltherulesXY

withminimumsupportandconfidencesupport,s,probabilitythatatransactioncontainsXYconfidence,c,

conditionalprobabilitythatatransactionhavingXalsocontainsYLetsupmin=50%,confmin=50%Freq.Pat.:{A:3,B:3,D:4,E:3,AD:3}Associationrules:AD(60%,100%)DA(60%,75%)CustomerbuysdiaperCustomerbuysbothCustomerbuysbeerTransaction-idItemsbought10A,B,D20A,C,D30A,D,E40B,E,F50B,C,D,E,F03May2023DataMining:ConceptsandTechniques5ClosedPatternsandMax-PatternsAlongpatterncontainsacombinatorialnumberofsub-patterns,e.g.,{a1,…,a100}contains(1001)+(1002)+…+(110000)=2100–1=1.27*1030sub-patterns!Solution:Mineclosedpatternsandmax-patternsinsteadAnitemsetX

isclosedifXisfrequentandthereexistsnosuper-patternYכX,withthesamesupportasX(proposedbyPasquier,etal.@ICDT’99)AnitemsetXisamax-patternifXisfrequentandthereexistsnofrequentsuper-patternYכX(proposedbyBayardo@SIGMOD’98)Closedpatternisalosslesscompressionoffreq.patternsReducingthe#ofpatternsandrules03May2023DataMining:ConceptsandTechniques6ClosedPatternsandMax-PatternsExercise.DB={<a1,…,a100>,<a1,…,a50>}Min_sup=1.Whatisthesetofcloseditemset?<a1,…,a100>:1<a1,…,a50>:2Whatisthesetofmax-pattern?<a1,…,a100>:1Whatisthesetofallpatterns?!!03May2023DataMining:ConceptsandTechniques7Chapter5:MiningFrequentPatterns,AssociationandCorrelationsBasicconceptsandaroadmapEfficientandscalablefrequentitemsetminingmethodsConstraint-basedassociationminingSummary03May2023DataMining:ConceptsandTechniques8ScalableMethodsforMiningFrequentPatternsThedownwardclosurepropertyoffrequentpatternsAnysubsetofafrequentitemsetmustbefrequentIf{beer,diaper,nuts}isfrequent,sois{beer,diaper}i.e.,everytransactionhaving{beer,diaper,nuts}alsocontains{beer,diaper}Scalableminingmethods:ThreemajorapproachesApriori(Agrawal&Srikant@VLDB’94)Freq.patterngrowth(FPgrowth—Han,Pei&Yin@SIGMOD’00)Verticaldataformatapproach(Charm—Zaki&Hsiao@SDM’02)03May2023DataMining:ConceptsandTechniques9Apriori:ACandidateGeneration-and-TestApproachAprioripruningprinciple:Ifthereisanyitemsetwhichisinfrequent,itssupersetshouldnotbegenerated/tested!(Agrawal&Srikant@VLDB’94,Mannila,etal.@KDD’94)Method:Initially,scanDBoncetogetfrequent1-itemsetGeneratelength(k+1)candidateitemsetsfromlengthkfrequentitemsetsTestthecandidatesagainstDBTerminatewhennofrequentorcandidatesetcanbegenerated03May2023DataMining:ConceptsandTechniques10TheAprioriAlgorithm—AnExampleDatabaseTDB1stscanC1L1L2C2C22ndscanC3L33rdscanTidItems10A,C,D20B,C,E30A,B,C,E40B,EItemsetsup{A}2{B}3{C}3{D}1{E}3Itemsetsup{A}2{B}3{C}3{E}3Itemset{A,B}{A,C}{A,E}{B,C}{B,E}{C,E}Itemsetsup{A,B}1{A,C}2{A,E}1{B,C}2{B,E}3{C,E}2Itemsetsup{A,C}2{B,C}2{B,E}3{C,E}2Itemset{B,C,E}Itemsetsup{B,C,E}2Supmin=203May2023DataMining:ConceptsandTechniques11TheAprioriAlgorithmPseudo-code:Ck:CandidateitemsetofsizekLk:frequentitemsetofsizekL1={frequentitems};for

(k=1;Lk!=;k++)dobegin

Ck+1=candidatesgeneratedfromLk;

foreachtransactiontindatabasedo

incrementthecountofallcandidatesinCk+1thatarecontainedint

Lk+1=candidatesinCk+1withmin_support

endreturn

k

Lk;03May2023DataMining:ConceptsandTechniques12ImportantDetailsofAprioriHowtogeneratecandidates?Step1:self-joiningLkStep2:pruningHowtocountsupportsofcandidates?ExampleofCandidate-generationL3={abc,abd,acd,ace,bcd}Self-joining:L3*L3abcdfromabcandabdacdefromacdandacePruning:acdeisremovedbecauseadeisnotinL3C4={abcd}03May2023DataMining:ConceptsandTechniques13HowtoGenerateCandidates?SupposetheitemsinLk-1arelistedinanorderStep1:self-joiningLk-1

insertinto

Ckselectp.item1,p.item2,…,p.itemk-1,q.itemk-1fromLk-1p,Lk-1qwherep.item1=q.item1,…,p.itemk-2=q.itemk-2,p.itemk-1<q.itemk-1Step2:pruningforallitemsetscinCk

doforall(k-1)-subsetssofcdoif(sisnotinLk-1)thendeletecfromCk03May2023DataMining:ConceptsandTechniques14ChallengesofFrequentPatternMiningChallengesMultiplescansoftransactiondatabaseHugenumberofcandidatesTediousworkloadofsupportcountingforcandidatesImprovingApriori:generalideasReducepassesoftransactiondatabasescansShrinknumberofcandidatesFacilitatesupportcountingofcandidates03May2023DataMining:ConceptsandTechniques15SamplingforFrequentPatternsSelectasampleoforiginaldatabase,minefrequentpatternswithinsampleusingAprioriScandatabaseoncetoverifyfrequentitemsetsfoundinsample,onlyborders

ofclosureoffrequentpatternsarecheckedExample:checkabcdinsteadofab,ac,…,etc.ScandatabaseagaintofindmissedfrequentpatternsH.Toivonen.Samplinglargedatabasesforassociationrules.InVLDB’9603May2023DataMining:ConceptsandTechniques16BottleneckofFrequent-patternMiningMultipledatabasescansarecostlyMininglongpatternsneedsmanypassesofscanningandgenerateslotsofcandidatesTofindfrequentitemseti1i2…i100#ofscans:100#ofCandidates:(1001)+(1002)+…+(110000)=2100-1=1.27*1030!Bottleneck:candidate-generation-and-testCanweavoidcandidategeneration?03May2023DataMining:ConceptsandTechniques17Chapter5:MiningFrequentPatterns,AssociationandCorrelationsBasicconceptsandaroadmapEfficientandscalablefrequentitemsetminingmethodsConstraint-basedassociationminingSummary03May2023DataMining:ConceptsandTechniques18Constraint-based(Query-Directed)MiningFindingallthepatternsinadatabaseautonomously?—unrealistic!Thepatternscouldbetoomanybutnotfocused!DataminingshouldbeaninteractiveprocessUserdirectswhattobeminedusingadataminingquerylanguage(oragraphicaluserinterface)Constraint-basedminingUserflexibility:providesconstraintsonwhattobeminedSystemoptimization:exploressuchconstraintsforefficientmining—constraint-basedmining03May2023DataMining:ConceptsandTechniques19ConstraintsinDataMiningKnowledgetypeconstraint:classification,association,etc.Dataconstraint

—usingSQL-likequeriesfindproductpairssoldtogetherinstoresinChicagoinDec.’02Dimension/levelconstraintinrelevancetoregion,price,brand,customercategoryRule(orpattern)constraintsmallsales(price<$10)triggersbigsales(sum>$200)Interestingnessconstraintstrongrules:min_support3%,min_confidence60%03May2023DataMining:ConceptsandTechniques20ConstrainedMiningvs.Constraint-BasedSearchConstrainedminingvs.constraint-basedsearch/reasoningBothareaimedatreducingsearchspaceFindingallpatternssatisfyingconstraintsvs.findingsome(orone)answerinconstraint-basedsearchinAIConstraint-pushingvs.heuristicsearchItisaninterestingresearchproblemonhowtointegratethemConstrainedminingvs.queryprocessinginDBMSDatabasequeryprocessingrequirestofindallConstrainedpatternminingsharesasimilarphilosophyaspushingselectionsdeeplyinqueryprocessing03May2023DataMining:ConceptsandTechniques21TheAprioriAlgorithm—ExampleDatabaseDScanDC1L1L2C2C2ScanDC3L3ScanD03May2023DataMining:ConceptsandTechniques22NaïveAlgorithm:Apriori+ConstraintDatabaseDScanDC1L1L2C2C2ScanDC3L3ScanDConstraint:Sum{S.price}<503May2023DataMining:ConceptsandTechniques23Chapter5:MiningFrequentPatterns,AssociationandCorrelationsBasicconceptsandaroadmapEfficientandscalablefrequentitemsetminingmethodsConstraint-basedassociationminingSummary03May2023DataMining:ConceptsandTechniques24Frequent-PatternMining:SummaryFrequentpatternmining—animportanttaskindataminingScalablefrequentpatternminingmethodsApriori(Candidategeneration&test)Projection-based(FPgrowth,CLOSET+,...)Verticalformatapproach(CHARM,...)MiningavarietyofrulesandinterestingpatternsConstraint-basedminingMiningsequentialandstructuredpatternsExtensionsandapplications03May2023DataMining:ConceptsandTechniques25Frequent-PatternMining:ResearchProblemsMiningfault-tolerantfrequent,sequentialandstructuredpatternsPatternsallowslimitedfaults(insertion,deletion,mutation)MiningtrulyinterestingpatternsSurprising,novel,concise,…ApplicationexplorationE.g.,DNAsequenceanalysisandbio-patternclassification“Invisible”datamining03May2023DataMining:ConceptsandTechniques26Ref:BasicConceptsofFrequentPatternMining(AssociationRules)R.Agrawal,T.Imielinski,andA.Swami.Miningassociationrulesbetweensetsofitemsinlargedatabases.SIGMOD'93.(Max-pattern)R.J.Bayardo.Efficientlymininglongpatternsfromdatabases.SIGMOD'98.(Closed-pattern)N.Pasquier,Y.Bastide,R.Taouil,andL.Lakhal.Discoveringfrequentcloseditemsetsforassociationrules.ICDT'99.(Sequentialpattern)R.AgrawalandR.Srikant.Miningsequentialpatterns.ICDE'9503May2023DataMining:ConceptsandTechniques27Ref:AprioriandItsImprovementsR.AgrawalandR.Srikant.Fastalgorithmsforminingassociationrules.VLDB'94.H.Mannila,H.Toivonen,andA.I.Verkamo.Efficientalgorithmsfordiscoveringassociationrules.KDD'94.A.Savasere,E.Omiecinski,andS.Navathe.Anefficientalgorithmforminingassociationrulesinlargedatabases.VLDB'95.J.S.Park,M.S.Chen,andP.S.Yu.Aneffectivehash-basedalgorithmforminingassociationrules.SIGMOD'95.H.Toivonen.Samplinglargedatabasesforassociationrules.VLDB'96.S.Brin,R.Motwani,J.D.Ullman,andS.Tsur.Dynamicitemsetcountingandimplicationrulesformarketbasketanalysis.SIGMOD'97.S.Sarawagi,S.Thomas,andR.Agrawal.Integratingassociationruleminingwithrelationaldatabasesystems:Alternativesandimplications.SIGMOD'98.03May2023DataMining:ConceptsandTechniques28Ref:Depth-First,Projection-BasedFPMiningR.Agarwal,C.Aggarwal,andV.V.V.Prasad.Atreeprojectionalgorithmforgenerationoffrequentitemsets.J.ParallelandDistributedComputing:02.J.Han,J.Pei,andY.Yin.Miningfrequentpatternswithoutcandidategeneration.SIGMOD’00.J.Pei,J.Han,andR.Mao.CLOSET:AnEfficientAlgorithmforMiningFrequentClosedItemsets.DMKD'00.J.Liu,Y.Pan,K.Wang,andJ.Han.MiningFrequentItemSetsbyOpportunisticProjection.KDD'02.J.Han,J.Wang,Y.Lu,andP.Tzvetkov.MiningTop-KFrequentClosedPatternswithoutMinimumSupport.ICDM'02.J.Wang,J.Han,andJ.Pei.CLOSET+:SearchingfortheBestStrategiesforMiningFrequentClosedItemsets.KDD'03.G.Liu,H.Lu,W.Lou,J.X.Yu.OnComputing,StoringandQueryingFrequentPatterns.KDD'03.03May2023DataMining:ConceptsandTechniques29Ref:VerticalFormatandRowEnumerationMethodsM.J.Zaki,S.Parthasarathy,M.Ogihara,andW.Li.Parallelalgorithmfordiscoveryofassociationrules.DAMI:97.ZakiandHsiao.CHARM:AnEfficientAlgorithmforClosedItemsetMining,SDM'02.C.Bucila,J.Gehrke,D.Kifer,andW.White.DualMiner:ADual-PruningAlgorithmforItemsetswithConstraints.KDD’02.F.Pan,G.Cong,A.K.H.Tung,J.Yang,andM.Zaki,CARPENTER:FindingClosedPatternsinLongBiologicalDatasets.KDD'03.03May2023DataMining:ConceptsandTechniques30Ref:MiningMulti-LevelandQuantitativeRulesR.SrikantandR.Agrawal.Mininggeneralizedassociationrules.VLDB'95.J.HanandY.Fu.Discoveryofmultiple-levelassociationrulesfromlargedatabases.VLDB'95.R.SrikantandR.Agrawal.Miningquantitativeassociationrulesinlargerelationaltables.SIGMOD'96.T.Fukuda,Y.Morimoto,S.Morishita,andT.Tokuyama.Dataminingusingtwo-dimensionaloptimizedassociationrules:Scheme,algorithms,andvisualization.SIGMOD'96.K.Yoda,T.Fukuda,Y.Morimoto,S.Morishita,andT.Tokuyama.Computingoptimizedrectilinearregionsforassociationrules.KDD'97.R.J.MillerandY.Yang.Associationrulesoverintervaldata.SIGMOD'97.Y.AumannandY.Lindell.AStatisticalTheoryforQuantitativeAssociationRulesKDD'99.03May2023DataMining:ConceptsandTechniques31Ref:MiningCorrelationsandInterestingRulesM.Klemettinen,H.Mannila,P.Ronkainen,H.Toivonen,andA.I.Verkamo.Findinginterestingrulesfromlargesetsofdiscoveredassociationrules.CIKM'94.S.Brin,R.Motwani,andC.Silverstein.Beyondmarketbasket:Generalizingassociationrulestocorrelations.SIGMOD'97.C.Silverstein,S.Brin,R.Motwani,andJ.Ullman.Scalabletechniquesforminingcausalstructures.VLDB'98.P.-N.Tan,V.Kumar,andJ.Srivastava.SelectingtheRightInterestingnessMeasureforAssociationPatterns.KDD'02.E.Omiecinski.AlternativeInterestMeasuresforMiningAssociations.TKDE’03.Y.K.Lee,W.Y.Kim,Y.D.Cai,andJ.Han.CoMine:EfficientMiningofCorrelatedPatterns.ICDM’03.03May2023DataMining:ConceptsandTechniques32Ref:MiningOtherKindsofRulesR.Meo,G.Psaila,andS.Ceri.AnewSQL-likeoperatorforminingassociationrules.VLDB'96.B.Lent,A.Swami,andJ.Widom.Clusteringassociationrules.ICDE'97.A.Savasere,E.Omiecinski,andS.Navathe.Miningforstrongnegativeassociationsinalargedatabaseofcustomertransactions.ICDE'98.D.Tsur,J.D.Ullman,S.Abitboul,C.Clifton,R.Motwani,andS.Nestorov.Queryflocks:Ageneralizationofassociation-rulemining.SIGMOD'98.F.Korn,A.Labrinidis,Y.Kotidis,andC.Faloutsos.Ratiorules:Anewparadigmforfast,quantifiabledatamining.VLDB'98.K.Wang,S.Zhou,J.Han.ProfitMining:FromPatternstoActions.EDBT’02.03May2023DataMining:ConceptsandTechniques33Ref:Constraint-BasedPatternMiningR.Srikant,Q.Vu,andR.Agrawal.Miningassociationruleswithitemconstraints.KDD'97.R.Ng,L.V.S.Lakshmanan,J.Han&A.Pang.Exploratoryminingandpruningoptimizationsofconstrainedassociationrules.SIGMOD’98.M.N.Garofalakis,R.Rastogi,K.Shim:SPIRIT:SequentialPatternMiningwithRegularExpressionConstraints.VLDB’99.G.Grahne,L.Lakshmanan,andX.Wang.Efficientminingofconstrainedcorrelatedsets.ICDE'00.J.Pei,J.Han,andL.V.S.Lakshmanan.MiningFrequentItemsetswithConvertibleConstraints.ICDE'01.J.Pei,J.Han,andW.Wang,MiningSequentialPatternswithConstraintsinLargeDatabases,CIKM'02.03May2023DataMining:ConceptsandTechniques34Ref:MiningSequentialandStructuredPatternsR.SrikantandR.Agrawal.Miningsequentialpatterns:Generalizationsandperformanceimprovements.EDBT’96.H.Mannila,HToivonen,andA.I.Verkamo.Discoveryoffrequentepisodesineventsequences.DAMI:97.M.Zaki.SPADE:AnEfficientAlgorithmforMiningFrequentSequences.MachineLearning:01.J.Pei,J.Han,H.Pinto,Q.Chen,U.Dayal,andM.-C.Hsu.PrefixSpan:MiningSequentialPatternsEfficientlybyPrefix-ProjectedPatternGrowth.ICDE'01.M.KuramochiandG.Karypis.FrequentSubgraphDiscovery.ICDM'01.X.Yan,J.Han,andR.Afshar.CloSpan:MiningClosedSequentialPatternsinLargeDatasets.SDM'03.X.YanandJ.Han.CloseGraph:MiningClosedFrequentGraphPatterns.KDD'03.03May2023DataMining:ConceptsandTechniques35Ref:MiningSpatial,Multimedia,andWebDataK.KoperskiandJ.Han,DiscoveryofSpatialAssociationRulesinGeographicInformationDatabases,SSD’95.O.R.Zaiane,M.Xin,J.Han,DiscoveringWebAccessPatternsandTrendsbyApplyingOLAPandDataMiningTechnologyonWebLogs.ADL'98.O.R.Zaiane,J.Han,andH.Zhu,MiningRecurrentItemsinMultimediawithProgressiveResolutionRefinement.ICDE'00.D.GunopulosandI.Tsoukatos.EfficientMiningofSpatiotemporalPatterns.SSTD'01.03May2023DataMining:ConceptsandTechniques36Ref:MiningFrequentPatternsinTime-SeriesDataB.Ozden,S.Ramaswamy,andA.Silberschatz.Cyclicassociationrules.ICDE'98.J.Han,G.DongandY.Yin,EfficientMiningofPartialPeriodicPatternsinTimeSeriesDatabase,ICDE'99.H.Lu,L.Feng,andJ.Han.BeyondIntra-TransactionAssociationAnalysis:MiningMulti-DimensionalInter-TransactionAssociationRules.TOIS:00.B.-K.Yi,N.Sidiropoulos,T.Johnson,H.V.Jagadish,C.Faloutsos,andA.Biliris.OnlineDataMiningforCo-EvolvingTimeSequences.ICDE'00.W.Wang,J.Yang,R.Muntz.TAR:TemporalAssociationRulesonEvolvingNumericalAttributes.ICDE’01.J.Yang,W.Wang,P.S.Yu.MiningAsynchronousPeriodicPatternsinTimeSeriesData.TKDE’03.03May2023DataMining:ConceptsandTechniques37Ref:IcebergCubeandCubeComputationS.Agarwal,R.Agrawal,P.M.Deshpande,A.Gupta,J.F.Naughton,R.Ramakrishnan,andS.Sarawagi.Onthecomputationofmultidimensionalaggregates.VLDB'96.Y.Zhao,P.M.Deshpande,andJ.F.Naughton.Anarray-basedalgorithmforsimultaneousmultidi-mensionalaggregates.SIGMOD'97.J.Gray,etal.Datacube:Arelationalaggregationoperatorgeneralizinggroup-by,cross-tabandsub-totals.DAMI:97.M.Fang,N.Shivakumar,H.Garcia-Molina,R.Motwani,andJ.D.Ullman.Computingicebergqueriesefficiently.VLDB'98.S.Sarawagi,R.Agrawal,andN.Megiddo.Discovery-drivenexplorationofOLAPdatacubes.EDBT'98.K.BeyerandR.Ramakrishnan.Bottom-upcomputationofsparseandicebergcubes.SIGMOD'99.03May2023DataMining:ConceptsandTechniques38Ref:IcebergCubeandCubeExplorationJ.Han,J.Pei,G.Dong,andK.Wang,ComputingIcebergDataCubeswithComplexMeasures.SIGMOD’01.W.Wang,H.Lu,J.Feng,andJ.X.Yu.CondensedCube:AnEffectiveApproachtoReducingDataCubeSize.ICDE'02.G.Dong,J.Han,J.Lam,J.Pei,andK.Wang.MiningMulti-DimensionalConstrainedGradientsinDataCubes.VLDB'01.T.Imielinski,L.Khachiyan,andA.Abdulghani.Cubegrades:Generalizingassociationrules.DAMI:02.L.V.S.Lakshmanan,J.Pei,a

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论