版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
PAGE
PAGE
30
TheOpenUniversityofIsrael
DepartmentofMathematicsandComputerScience
TheComputerScienceDivision
IMPROVINGDATAMININGALGORITHMSUSINGCONSTRAINTS
By
ShaiShimon
ID028455863
Emailshaishimon@
Cell0526-126207
PreparedunderthesupervisionofProfessorEhudGudes
Feb2012
TableofContents
TOC\o"1-5"\h\z
Listoffigures
3
ListofTables
4
1 ABSTRACTANDINTRODUCTION
5
1.1 ABSTRACT
5
1.2 INTRODUCTION
6
2 basicconcepts
8
2.1 INTRODUCION
12
2.2 APRIORI-FastAlgorithmsforMiningAssociationRules[1]
12
2.3 FP-TREEALGORITHM[8]
19
3 PAPERSsurvey
34
3.1 USINGCONSTRAINTS
34
3.1.1 INTRODUCTIONANDMOTIVATION
34
3.1.2 MININGFREQUENTITEMSETSWITHCONVERTIBLECONSTRAINTS[9]
34
Introduction
34
Convertibleconstraints-motivation
35
3.1.3 MININGASSOCIATIONRULESWITHITEMCONSTRAINTS[10]
37
Abstract
37
Introduction
37
Algorithms
37
Tradeoffs
38
Conclusions
38
3.1.4 EXAMINEROPTIMIZEDLEVEL-WISEFREQUENTPATTERNMININGWITHMONOTONECONSTRAINTSALGORITHM[4]
39
Abstract
39
Introduction
39
Definitions
40
ExAMineralgorithm
41
FlowchartofexaMiner
43
ExaMineralgorithmexample
44
Experiments
53
3.1.5 FP-BONSAIALGORITHM[5]
56
Introduction
56
FP-bonsaialgorithm
57
FP-Bonsaialgorithmexample
58
Disadvantage
63
Experiments
64
Summary
65
3.2 shortpaperssurveys
65
4 IMPLEMENTAION
70
5 summaryandconclusions
74
6 REFERENCES
75
Listoffigures
16
passexecutiontimesofAprioriandAprioriTId
Figure2.1.1-1
17
Executiontimesfordecreasingminimumsupport(maxpotentiallylargeitemsetis2
Figure2.1.1-2
18
Executiontimesfordecreasingminimumsupport(maxpotentiallylargeitemsetis4
Figure2.1.1-3
18
Executiontimesfordecreasingminimumsupport(maxpotentiallylargeitemsetis6
Figure2.1.1-4
26
FPgrowsexample
Figure2.1.1-5
26
FPgrowsexampleforp(1)
Figure2.1.1-6
27
FPgrowsexampleforp(2)
Figure2.1.1-7
27
FPgrowsexampleform(1)
Figure2.1.1-8
28
FPgrowsexampleform(2)
Figure2.1.1-9
28
FPgrowsexampleforam
Figure2.1.1-10
29
FPgrowsexampleforcamandfam
Figure2.1.1-11
29
FPgrowsexampleforcamandcm
Figure2.1.1-12
30
FPgrowsexampleforcamandfm
Figure2.1.1-13
30
FPgrowsexampleresults
Figure2.1.1-14
32
FPtreealgorithmexperiment–Runtime,supportthreshold
Figure2.1.1-15
32
FPgrowsalgorithmexperiment–Transactionsnumberwiththreshold=1.5%
Figure2.1.1-16
43
ExAMiner0
Figure3.1.4-1
44
ExAMiner1&ExAMiner2
Figure3.1.4-2
54
ExaMinerexperimentDataReductionRate(min_sup=1100)
Figure3.1.4-3
54
ExaMinerexperimentDataReductionRate(min_sup=500)
Figure3.1.4-4
55
ExaMinerexperimentRuntimesynthetic(min_sup=1200)
Figure3.1.4-5
56
ExaMinerexperimentRuntimesynthetic(sum(prices)>2800)
Figure3.1.4-6
64
FPBonsaiExaminerexperiment(BMS-POS)(1)
Figure3.1.5-1
65
FPBonsaiExaminerexperiment(BMS-POS)(2
Figure3.1.5-2
72
Applicationwindow–mainpanel
Figure4-1
73
Applicationwindow–FPtreeresultpanel
Figure4-2
ListofTables
8
Market-Baskettransactions
Table2-1
9
Convertibleanti-monotone
Table2-2
10
Convertiblemonotone
Table2-3
11
stronglyconvertibleconstraints
Table2-4
22
FPtreealgorithmexampleTid,item,andfrequency(1)
Table2.1.1-1
22
FPtreealgorithmexampleTid,item,andfrequency(2)
Table2.1.1-2
23
FPtreealgorithmexampleTid,item,andfrequency(3)
Table2.1.1-3
23
FPtreealgorithmexampleTid,item,andHeadertable(1)
Table2.1.1-4
24
FPtreealgorithmexampleTid,item,andHeadertable(2)
Table2.1.1-5
24
FPtreealgorithmexampleTid,item,andHeadertable(3)
Table2.1.1-6
27
FtreealgorithmexampleTid,item,andHeadertable(4)
Table2.1.1-7
27
FPtreealgorithmexampleTid,item,andHeadertable(5)
Table2.1.1-8
31
FPtreealgorithmexperiment–Syntheticdataset
Table2.1.1-9
36
TransactionIdANDtransaction
Table3.1.2-1
36
Frequentitemsetswithsupportthreshold
Table3.1.2-2
43
ExaMiner0
Table3.1.4-1-A
44
ExaMiner1
Table3.1.4-1-B
45
ExaMinerexamplelevelone(1)
Table3.1.4-2
45
ExaMinerexamplelevelone(2)
Table3.1.4-3
46
ExaMinerexamplelevelone(3)
Table3.1.4-4
47
ExaMinerexamplelevelone(4)
Table3.1.4-5
48
ExaMinerexamplelevelone(5)
Table3.1.4-6
49
ExaMinerexamplelevelone(6)
Table3.1.4-7
49
ExaMinerexampleleveltwo(1)
Table3.1.4-8
50
ExaMinerexampleleveltwo(2)
Table3.1.4-9
50
ExaMinerexampleleveltwo(4)
Table3.1.4-10
51
ExaMinerexampleleveltwo(5)
Table3.1.4-11
52
ExaMinerexampleleveltwo(7)
Table3.1.4-12
52
ExaMinerexampleleveltwo(7)
Table3.1.4-13
53
ExaMinerexampleleveltwo(7)
Table3.1.4-14
58
FPBonsaiexample-item,valuetable
Table3.1.5-1
59
FPBonsaiexample-Tid,itemstable
Table3.1.5-2
59
FPBonsaiexample-pruning–(constraintcheck)
Table3.1.5-3
60
FPBonsaiexampleRun-pruning(Supportcheck)
Table3.1.5-4
60
FPBonsaiexample-pruning(1)
Table3.1.5-5
61
FPBonsaiexampleRun-pruning(2)
Table3.1.5-6
61
FPBonsaiexample-pruning(3)
Table3.1.5-7
62
FPBonsaiexampleRun-pruning(4)
Table3.1.5-8
62
FPBonsaiexample-pruning(5)
Table3.1.5-9
62
FPBonsaiexampleresults
Table3.1.5-10
71
Transactionstable
Table4-1
71
Itemsandprices
Table4-2
71,72
Experimentsresults
Table4-3
ABSTRACTANDINTRODUCTION
ABSTRACT
Thepurposeofdataminingistoidentifyandpredictpatterns,trendsandrelationshipsindata.Themainstepsindataminingprocessare:
Definingtheproblem,preparationofinformation,dataanalysis,evaluationoftheresults,displayingtheresults.
InthisworkI'llpresentanumberofdataminingalgorithmsusingassociationrules.FirstI'llpresentthebasicalgorithms(AprioriAlgorithmandFPTree)andthenwe'lldiscussalgorithmswithconstraints.Wewillpresentthealgorithmswithconstraintsindetail,andalsoweshalldiscussthedifferencesbetweenthem.
Infactthisworkwillfocusondataminingalgorithmswithconstraints.Wewillfocusontheimportanceofconstraintsindatamining,ontheiruse,andexploredifferenttypesofconstraintsandeffectivemethodsofdataminingalgorithms.Asit'swellknown,sincethesizeofdataminingresultsmaysometimesbeverylarge,usingconstraintshelptheuserfindthedesiredinformationandimprovesthesystemperformance.Thisworkwillfocusoncertaintypesofconstraints,andalgorithmsthatwerebuiltforthem.Specifically,thealgorithmsthatwewillrevieware:
MININGFREQUENTITEMSETSWITHCONVERTIBLECONSTRAINTS[10]
MININGASSOCIATIONRULESWITHITEMCONSTRAINTS[11]
EXAMINEROPTIMIZEDLEVEL-WISEFREQUENTPATTERNMININGWITHMONOTONECONSTRAINTSALGORITHM[4]
FP-BONSAIALGORITHM[5]
Inadditionwewillreviewbrieflysixotherarticles:FourarticlesonconstraintsandtwoadvancedalgorithmsthanApriori.
Thelastphaseoftheworkisanimplementationoftwoalgorithms:Bonsai-treeandFP-tree.TheimplementationwascodedintheJAVAlanguage.TheDatabaseinputisasyntheticdatabaseanditwasbuiltbyarandomgeneratorthatwasespeciallydevelopedforthispurpose.Theresultsandconclusionsoftheevaluationaresummarizedinthepaper.
INTRODUCTION
BACKGROUND
Overtheyears,massstoragecosthasdecreaseddramatically,anddatabasetechnology,incorporatingtheubiquitousInternet,hasevolvedtobemoreintelligentandpowerful.Wearenowattheequinoxwherewehavetoomuchdatayetsofewcomputerizedtoolstoanalyzeit,letaloneapplytheknowledgeresultedfromtheanalysistoexpediteinformationdissemination,scientificresearch,andindustrialandcommercialdecisionmaking.Weareindeeddatabillionaireslivinginthegutterofknowledge.
Thisiswheredataminingcamein,whichstartedoutasadirectconsequenceofinformationtechnologydevelopment.Followingtheamazingprogressinthefield,dataminingcannowprovidetheoreticalfoundationstoimplementanalyzingsoftwareforvariouskindsofapplications.
Thispaperismainlyfocusonhowtoefficientlygenerateassociationrules.
Theuserisallowedtoexpresshisfocusinmining,bymeansofarichclassofconstraintsthatcaptureapplicationsemantics.Besidesallowinguserexplorationandcontrol,theparadigmallowsmanyoftheseconstraintstobepusheddeepinsidemining(laterdiscussedinbasicconcepts),thuspruningthesearchspaceofpatternstothoseofinteresttotheuser,andachievingsuperiorperformance.
Inthisreportwediscuss2maintopicsalgorithmsforconstraintsandefficiencyimprovements.
PURPOSE
Thispaperisdividedto4maintopics:
BasicconceptsandshortbriefonbothAprioriandFP-Treealgorithms–Inthissectionwe'llfocusonthebasicconceptswhichwillhelpusdealwiththerestofthepaperandwe'lldiscussshortlyabout2algorithms:AprioriandFP-Tree.
Papersurvey–Inthissectionweshowtheadvantageoftheconstraints.Withconstraintsweobtainfewerpatternswhicharemoreinteresting.Indeedconstraintsarethewayweusetodefinewhatis“interesting”.Herewe'llintroduce4articles,whichwilluseconstraintsmethodology:
"MiningFrequentItemsetswithConvertibleConstraints"[9]
"Miningassociationruleswithitemconstraints"[10]
"Examineralgorithm"[4]
"FPBonsai"[4]
Shortpapersurvey–Herewe'lldescribebrieflyfewarticleswhichdealbothimprovingbasicalgorithmsandconstraintsalgorithms.
Applicationimplementation–Afterdescribingalltheabovearticles,we'llshowtheresultsofanapplicationwhichwaswritteninjava.Thisapplicationimplements2algorithms:"FPTree"and"FPBonsai".Theprogramwasrunwithdatathatwasgeneratedsyntactically.Theprogramwillshowthedurationofeachalgorithminadditiontotheresults.
basicconcepts
Associationrulesmining-Givenasetoftransactions,findrulesthatwillpredicttheoccurrenceofanitembasedontheoccurrencesofotheritemsinthetransaction
Market-Baskettransactions
ExampleofAssociationRules
{Diapers}{Beer},
{Milk,Bread}{Eggs,Coke},
{Beer,Bread}{Milk},
Table2-1Market-Baskettransactions
Itemset
Acollectionofoneormoreitems
Example:{Milk,Bread,Diapers}
k-itemset
Anitemsetthatcontainskitems
Supportcount(s-sigma)
Frequencyoccurrenceofanitemset
E.g.s({Milk,Bread,Diapers})=2
Support
Thepercentageofthefractionoftransactionsthatcontainanitemsetrepresentsthesupport.Orinotherwordsanitemsetwhichappearsinxtransactionsofthedatabaseisthesupportofthisitemset.
E.g.s({Milk,Bread,Diapers})appearin2transactionsinthetableabovesothesupportis2/5*100=40%
Confidence
Confidencedenotesthestrengthofimplicationintherule,meansthemoretheconfidencehighertherelationshipbetweenthe2setsisstronger.Thecasuallinkbetweenmilkandbreadisstrongintheexamplebecausetheconfidenceis75%.
Confidence(X=>Y)=Support(XY)/Support(X)
E.g.s({Milk,Bread})=3/5
s({Milk})=4/5
Confidence(MilkBread)=(3/5)/(4/5)=0.75->75%
FrequentItemset
Anitemsetwhosesupportisgreaterthanorequaltoaminsupthreshold
AssociationRule
AnimplicationexpressionoftheformX®Y,whereXandYareitemsets
Example:
{Milk,Diapers}®{Beer}
Constraints
Whatareconstraintsindatamining?Constraintsaretherulesenforcedondatatransactions.
TheIdeainconstraintsistofocusonthespecificandrelevantitemsetswhichwewanttomine.Thefollowingbellowsaresomebasicconceptsregardingconstrains.
Constraintsmining–Aimtoreducesearchspace.Itfindallpatternssatisfyingconstraints
Constraintsbasedsearch-Aimtoreducesearchspaceandfindsonlysome(orone)answer
BothConstraintsminingandConstraintsbasedsearchareaimedatreducingsearchspacebutthefirstfindtheallthepatternsandtheotherfindsomeoroneofthepatterns.Thisofcoursemakesthedifferenceintheruntimeandthememoryusage.
Anti-monotonic-WhenanitemsetSviolatestheconstraint,sodoesanyofitssuperset
Example:C:range(S.profit)£15isanti-monotoneItemsetabviolatesC
range(ab)=40£15
Sodoeseverysupersetofab
Monotonic-WhenanitemsetSsatisfiestheconstraint,sodoesanyofitssuperset
Example:C:range(S.profit)³15ItemsetabsatisfiesC
range(ab)=40³15
Sodoeseverysupersetofab
Thefollowingtablesarefortheexamples
Item
Profit
a
40
b
0
c
-20
d
10
e
-30
f
30
g
20
h
-10
TID
Transaction
10
a,b,c,d,f
20
b,c,d,f,g,h
30
a,c,d,e,f
40
c,e,f,g
Table2-2ConvertibleAntimonotone
Convertibleanti-monotone-AssumethereisanorderR.WheneveranitemsetSsatisfiesC,sodoesanyprefixofS.
Example:C:avg(S)³20w.r.t.itemvaluedescendingorder
Theitemset“abc”satisfiesC
avg(abc)=30³20
andsodoes“ab”avg(ab)=35³20and“a”avg(a)=40³20
Convertiblemonotone-AssumethereisanorderR.WheneveranditemsetSviolatesC,sodoesanyprefixofS.
Example:C:avg(S)£20w.r.t.itemvaluedescendingorderThe
itemset“abc”violatesCavg(abc)=30£20andsodoes“ab”
avg(ab)=35£20and“a”avg(a)=40£20
Thefollowingtablesarefortheexamples
Item
Profit
a
40
b
30
c
20
d
10
e
7
f
5
g
3
h
1
TID
Transaction
10
a,b,c,d,f
20
b,c,d,f,g,h
30
a,c,d,e,f
40
c,e,f,g
Table2-3Convertiblemonotone
Stronglyconvertibleconstraints
WheneverthereexistsanorderRoverthesetofitemssuchthatCisconvertible
anti-monotoneRandconvertiblemonotoneR^-1
ExampleC:avg(X)³25isconvertibleanti-monotonew.r.t.itemvaluedescendingorderR
Theitemset“afg”satisfiesCsodoes“af”and“a”.
avg(X)³25isconvertiblemonotonew.r.t.itemvalueascendingorderR^-1
Theitemset“ech”violatesCsodoes“ec”and“e”.
Tabledescendingorder Tableascendingorder
Item
Value
e
1
c
3
h
5
b
8
d
10
g
20
f
30
a
40
Item
Value
a
40
f
30
g
20
d
10
b
8
h
5
c
3
e
1
Table2-4stronglyconvertibleconstraints
Succinctnessconstraints
GivenA1,thesetofitemssatisfyingasuccinctnessconstraintC,thenanysetSsatisfyingCisbasedonA1,i.e.,ScontainsasubsetbelongingtoA1.min(S.Price)£vissuccinctbecauseeachsubsetwhosatisfytheconstraintisasubsetofA1.,sum(S.Price)³visnotsuccinct.
min(S.Price)£v
A1={20,30,40,8,5,3}
V=70
min(A1)<VsatisfytheconstraintsodoeseachsubsetofA1satisfytheconstraint.
sum(A1)>VsatisfytheconstraintbutnoteachsubsetofA1satisfytheconstraint.Forexamplesum(20,30)<70
INTRODUCION
AprioriisthemostsimpleandmostwidelyknownalgorithmforminingfrequentitemsetscreatedbyR.AgrawalandR.Skrikant.
TheApriorialgorithmworksiteratively.Itfirstfindsthesetoflarge1-itemsets,andthensetof2-itemsets,andsoon.Thenumberofscanoverthetransactiondatabaseisasmanyasthelengthofthemaximalitemset.Aprioriisbasedonthefollowingfact:Thesimplebutpowerfulobservationleadstothegenerationofasmallercandidatesetusingthesetoflargeitemsetsfoundinthepreviousiteration.
Disadvantages
Generationofcandidateitemsetsisexpensive(inbothspaceandtime)
UnlikeAprioriFP-growthusesanextendedprefix-treestructuretostorethedatabaseinacompressedform.ItusesapatternfragmentgrowthmethodtoavoidthecostlyprocessofcandidategenerationandtestingusedbyApriori.
APRIORI-FastAlgorithmsforMiningAssociationRules[1]
Algorithmssummarize
Countitemoccurrences
Generatenewk-itemsetscandidates
Findthesupportofallthecandidates
Takeonlythosewithsupportoverminsup
Apriori,firstscansthetransactiondatabasesDinordertocountthesupportofeachitemiinI,anddeterminesthesetoflarge1-itemsets.Thenoneiterationisperformedforeachofthecomputationofthesetof2-itemsets,3-itemsets,andsoon.Thekthiterationconsistsoftwosteps:
GeneratethecandidatesetCkfromthesetoflarge(k-1)-itemsets,Lk-1.
ScanthedatabaseinordertocomputethesupportofeachcandidateitemsetinCk
Thecandidategenerationalgorithmisgivenasfollows:
Thecandidategenerationprocedurecomputesthesetofpotentiallylargek-itemsetsfromthesetoflarge(k-1)-itemsets.Anewcandidatek-itemsetisgeneratedfromtwolarge(k-1)-itemsetsiftheirfirst(k-2)itemsarethesame.ThecandidatesetCkisasupersetofthelargek-itemsets.Thecandidatesetisguaranteedtoincludeallpossiblelargek-itemsetsbecauseofthefactthatallsubsetsofalargeitemsetarealsolarge.SincealllargeitemsetsinLk-1arecheckedforcontributiontocandidateitemset,thecandidatesetCkiscertainlyasupersetoflargek-itemsets.Afterthecandidatesaregenerated,theircountsmustbecomputedinordertodeterminewhichofthemarelarge.Thiscountingstepisreallyimportantintheefficiencyofthealgorithm,becausethesetofthecandidateitemsetsmaybepossiblylarge.Apriorihandlesthisproblembyemployingahashtreeforstoringthecandidate.Thecandidategenerationalgorithmisusedtofindthecandidateitemsetscontainedinatransactionusingthishashtreestructure.ForeachtransactionTinthetransactiondatabaseD,thecandidatescontainedinTarefoundusingthehashtree,andthentheircountsareincremented.AfterexaminingalltransactioninD,theonesthatarelargeareinsertedintoLk.
Theproblemisthateverypassgoesoverthealldata,andit'snoefficientprocess.
TheanswerforthisproblemisaprioriTid.
Usesthedatabaseonlyonce.
BuildsastoragesetC^k
Membershastheform<TID,{Xk}>
Xkarepotentiallylargek-itemsintransactionTI.
Fork=1,C^1isthedatabase.
UsesC^kinpassk+1.
AlgorithmaprioryTid
Advantage
C^kcouldbesmallerthanthedatabase.
Ifatransactiondoesnotcontaink-itemsetcandidates,thanitwillbeexcludedfromC^k.
Forlargek,eachentrymaybesmallerthanthetransaction
Thetransactionmightcontainonlyfewcandidates.
Disadvantage
Forsmallk,eachentrymaybelargerthanthecorrespondingtransaction.
Anentryincludesallk-itemsetscontainedinthetransaction.
Figure2.1.1-1–PerpassexecutiontimesofAprioriandAprioriTId
WecanseeinthefigureabovethatintheearlierpassesaprioridoesbetterperformancebutinthelaterpassesaprioriTidbeatsApriori.That’sbecauseinthelaterpassesthenumberofcandidateitemsetsreduces.AprioriTiddoesn'tusethedatabaseitusesCKinstead.CKbecomesmallerandthat’swhyinthelaterpassesaprioriTidisbetter.
Sowhoisbetter?
Intheearlierpasses,AprioridoesbetterthanAprioriTid.However,AprioriTidbeatsAprioriinlaterpasses.Weobservedsimilarrelativebehaviorfortheotherdatasets,thereasonforwhichisasfollows.AprioriandAprioriTidusethesamecandidategenerationprocedureandthereforecountthesameitemsets.Inthelaterpasses,thenumberofcandidateitemsetsreduces.However,Aprioristillexamineseverytransactioninthedatabase.Ontheotherhand,ratherthanscanningthedatabase,AprioriTidscansCKforobtainingsupportcounts,andthesizeofCKhasbecomesmallerthanthesizeofthedatabase.WhentheCKsetscanfitinmemory,wedonotevenincurthecostofwritingthemtodisk.
Basedontheseobservations,wecandesignahybridalgorithm,whichwecallAprioriHybridthatusesAprioriintheinitialpassesandswitchestoAprioriTidwhenitexpectsthatthesetCKattheendofthepasswillfitinmemory.WeusethefollowingheuristictoestimateifCKwouldfitinmemoryinthenextpass.Attheendofthecurrentpass,wehavethecountsofthecandidate'siinCK.Fromthis,weestimatewhatthesizeofCKwouldhavebeenifithadbeengenerated.Thissize,inwords,is
IfCKinthispasswassmallenoughtofitinmemory,andtherewerefewerlargecandidatesinthecurrentpassthanthepreviouspass,weswitchtoAprioriTid.
Theswitchtakestime,butitstillworthit.WecanseefromthegraphsbellowtheadvantageofAprioryHybridalgorithm.IttakestheadvantagesofbothalgorithmsAprioriandAprioriTid.
T10.12.D100Kandtheothersrepresenttheparametersettings.
|T|-10–Averagesizeofthetransactions.
|I|-2-Averagesizeofthemaximalpotentiallylargeitemsets.
D–100K–Numberoftransactions.
settings12,14,16aretheaveragesizeofthemaximalpotentiallylargeitemsets.
WecanseeinthegraphsbellowthatApriorihasbetterperformancethanAprioriTid.Thereasonissmallnumberofitemsinallthetransactions.
AprioriTidhasgoodperformancewhenthesizeofthetransactionsisbig.BecauseinthespecificexamplesbellowthesizeissmalltheApriorihasbetterperformance.
Figure2.1.1-2–Executiontimesfordecreasingminimumsupport(maxpotentiallylargeitemsetis2
Figure2.1.1-3–Executiontimesfordecreasingminimumsupport(maxpotentiallylargeitemsetis4
Figure2.1.1-4–Executiontimesfordecreasingminimumsupport(maxpotentiallylargeitemsetis6
InthegraphabovewecanseehowAprioryHybridalgorithmtakestheadvantagesofbothalgorithmsAprioriandAprioriTid.
Note–Wemustrememberthatthefollowingconclusionsandthesummarybellowrefertothealgorithmsonthosetimes
Conclusions
TheApriorialgorithmsarebetterthanthepreviousalgorithms.
Forsmallproblemsbyfactors
Forlargeproblemsbyordersofmagnitudes.
Thealgorithmsarebestcombined.
Thealgorithmshowsgoodresultsinscale-upexperiments
AprioriTidusesC^kinsteadofthedatabase.IfC^kfitsinmemoryAprioriTidisfasterthanApriori
WhenC^kistoobigitcannotsitinmemory,andthecomputationtimeismuchlonger.ThusAprioriisfasterthanAprioriTid.
Summary
Associationrulesareanimportanttoolinanalyzingdatabases.
We’veseenanalgorithmwhichfindsallassociationrulesinadatabase.
Thealgorithmhasbettertimeresultsthenpreviousalgorithms.
Thealgorithmmaintainsitsperformancesforlargedatabases.
FP-TREEALGORITHM[8]
Candidategenerationisbyfarthemosttimeconsumingprocess,soitisdesirabletospeedthisup.FPTreealgorithmdirectlyminesfrequentitemsetswithoutgeneratingcandidates.TheclaimisthatbygatheringsufficientstatisticsintoaspecialstructurewhichcalledFPtree,allofthefrequentpatternscanbegeneratedwithoutgoingbacktothedatabase.Andthisdefinitelywillleadustobetterperformance.
AswelearnbeforeAprioriworkswellexceptwhentheinputis:
Lotsoffrequentpatternswithbigsetsofitemsorwithlowminimumsupportthreshold
Longpatterns
FPtreeavoidcandidatesetexplosionby:
Compacttreedatastructure(ItavoidrepeateddatascansthusitmuchsmallerthanthebasicDatabase).
Restrictedtest-only
Searchdivide-and-conquerbased
Algorithmssummarize
Thealgorithmmadeupfromtwophases:
Phase1-ConstructingFP-tree
ScanDBtofindL
Collectthesetoffrequentitems
SortLandDBindescendingfrequency
ScanDBagain-constructFP-tree
Phase2-ExecutingFP-Growth
MiningfrequentpatternsfromFP-tree
Processingfrequentitems
Onebyone
Bottomup
Eachitem
GeneratingaconditionalFP-tree
Algorithm–Phase1[8]
Algorithm–Phase2[8]
FP-Treealgorithmexample[8]
Herewe'llshowexamplewhichwillsummarizetheAlgorithmability.
Step1
ScanningDBtofindL
Example:Minimumsupport=60%
ScaneachTIDandupdatethefrequencyforeachiteminthenewtable
Table2.1.1-1-FPtreealgorithmexampleTid,item,andfrequency(1)
ScanDBtofindL(Listofallitemswhichmeetthesupport).
Afterscanning–Markingreentheitemswhichmeetthesupport
Table2.1.1-2-FPtreealgorithmexampleTid,item,andfrequency(2)
Step2
SortLindescendingfrequency
L={a:3,b:3,c:4,f:4,m:3,p:3}
L’={f:4,c:4,a:3,b:3,m:3,p:3}
Buildanewtablewhichcontainsonlytheitemswhichmeetthesupportandindescendingorder(seeL').
SortDB
Table2.1.1-3-FPtreealgorithmexampleTid,item,andfrequency(3)
Step3
InthisstepwescantheDBagaintoconstructtheFPtreetuplebytuple.
Westartwiththefirsttupletobuildthetreeinthesameorderoftheitems.Foreachitemwesetanumber,thisnumbernotehowmanytransactionsitbelongsto.
Table2.1.1-4-FPtreealgorithmexampleTid,item,andHeadertable(1)
Step3-Cont
Continuebuildingthetreeusingthesecondtuple.
Wecanseethenumbersineachnode.Itindicatesthenumberoftransactionswhichitbelongs
Table2.1.1-5-FPtreealgorithmexampleTid,item,andHeadertable(2)
Continuebuildingthetreeusingthethirdtuple.
Table2.1.1-6-FPtreealgorithmexampleTid,item,andHeadertable(3)
Continuebuildingthetreeusingtheforthtuple.
Table2.1.1-7-FPtreealgorithmexampleTid,item,andHeadertable(4)
Step3cont
Continuebuildingthetreeusingthelasttuple.
Table2.1.1-8-FPtreealgorithmexampleTid,item,andHeadertable(5)
Nowwe'llshowthesecondalgorithm-FPGrows
Afterthedatabaseiscompressedintoahighlycondensedandmuchsmallerdatastructure,wecontinuetothenextstep
MiningfrequentpatternsfromtheFP-Tree.Processingfrequentitemsonebyonebottomup.EachitemgeneratesaconditionalFP-Tree.
Figure2.1.1-5-FPgrowsexample
Exampleforp
FirstwemarkeachnodewhichisabovePinthesamebranch
Figure2.1.1-6-FPgrowsexampleforp(1)
Theonlyfrequentpatternsfor"P"are{p:3,cp:3}
Figure2.1.1-7-FPgrowsexamplep(2)
Exampleform
Again,firstwemarkeachnodewhichisaboveminthesamebranch
Figure2.1.1-8-FPgrowsexampleform(1)
Thefrequentpatternfor"m"are{m:3,am:3,cm:3,fm:3}
Figure2.1.1-9-FPgrowsexampleform(2)
SowerecursivelyconstructingconditionalFP-treefor:
am,cm,fm.
we'llstartwitham
Theprefixare{f:3,c:3}.Sothelargeitemsetswithamare:
{cam:3,fam:3}}
Figure2.1.1-10-FPgrowsexampleforam
Thefrequentpatternswithamare:
{am,cam,fam}.WerecursivelyconstructconditionalFP-Trees"cam","fam"
FP-Treefor"cam"
Thefrequentpatternswithcamare:{fcam}
Cam{fcam}
FP-Treefor"fam"
Thefrequentpatternswithfamis:{
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 工厂车间现场5培训
- 核安全风险辨识
- 数控车削加工技术 课件 项目七 端面切削工艺及编程
- (提升卷)第一单元 圆和扇形(单元测试)数学六年级上册单元速记巧练系列(冀教版)学生版
- 福建省泉州市南安市2024-2025学年四年级上学期期中考试数学试题 - 副本
- T-XYTX 002-2024 黄桃标准化生产与基地建设
- 河北省衡水市武强中学2024-2025学年高三年级上学期期中考试英语试题 含解析
- 高中语文第3单元文艺评论和随笔第9课说“木叶”课件新人教版必修
- 弃土场施工方案
- Windows Server网络管理项目教程(Windows Server 2022)(微课版)10.5 拓展案例2 NAT端口映射
- 四年级上册信息技术人教版第10课设动作与超链接(教案)
- 空气动力学数值方法:有限体积法(FVM):离散化技术与数值通量
- 北师大版九年级物理全一册电子课本教材
- 合作安全责任协议书范本
- 2024-2030年中国船舶电子导航系统行业市场发展趋势与前景展望战略分析报告
- 生产管理培训课件
- 2024秋八年级数学上册 第十四章 整式的乘法与因式分解14.1 整式的乘法 4整式的乘法-单项式与单项式相乘教学设计(新版)新人教版
- 小学语文整本书阅读《夏洛的网》导读课公开课一等奖创新教学设计
- 6以内的加减法
- DL∕T 1795-2017 柔性直流输电换流站运行规程
- 上海民政局夫妻离婚协议书(2024版)
评论
0/150
提交评论