IBM SPSS Modeler 14.2 for Association Analysis 用于关联分析的 IBM SPSS Modeler 14.2_第1页
IBM SPSS Modeler 14.2 for Association Analysis 用于关联分析的 IBM SPSS Modeler 14.2_第2页
IBM SPSS Modeler 14.2 for Association Analysis 用于关联分析的 IBM SPSS Modeler 14.2_第3页
IBM SPSS Modeler 14.2 for Association Analysis 用于关联分析的 IBM SPSS Modeler 14.2_第4页
IBM SPSS Modeler 14.2 for Association Analysis 用于关联分析的 IBM SPSS Modeler 14.2_第5页
已阅读5页,还剩9页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

PAGE

LastUpdated

DATE\@"M/d/yyyyh:mm:ssam/pm"

12/8/201112:57:58PM

Page

LastUpdated

DATE\@"M/d/yyyyh:mm:ssam/pm"

12/8/201112:57:58PM

Page

DataMiningwithIBMSPSSModeler14.2

UniversityofArkansas

DavidDouglas

AssociationAnalysis

NotesonAssociationAnalysisusingIBMSPSSModeler14.2

AssociationRulesUsingClementine

IBMSPSSModeler14.2hasthreedifferentalgorithmsforgeneratingassociationrules.Inputdataformatcanbeeithertabularortransactional.Themodelsare:

Apriori–alldatamustbecategorical

Carma–categoricalconsequentsbutcanhavenumericinputs

Sequential–sequentialassociationrules

Apriorialsoproducesassociationrulesinaveryefficientmanner.Italsohastheadvantageofhavingoptionsthatprovidechoicesinthecriterionmeasurementsusedtoguidedetectingtherules.However,ithasamajordisadvantageinthatonlycategorical(symbolic)fieldsareallowedasinputs.

Carma,unlikeGRIandApriori,offersoptionsforruledetectionthatincludessupportforboththeantecedentandtheconsequence;plusithandlesdataintransactionformat.Additionally,itallowsruleswithmultipleconsequents,oroutcomeandisnotlimitedtocategoricaldata.

Sequentialassociationanalysistakesintoaccountthesequenceofevents.Itworkswitheithertransactionortabledata.

NotesonDataFormatsforAssociationAnalysis

Marketbasketisanaturalforassociationanalysisandtherearetwogeneralformatsofdatarepresentationformarketbasketanalysis.Thefirstissometimesreferredtoasthetransactionaldataformatandthesecondisthetabulardataformat.Thetransactionaldataformatrequiresonlytwofields—anidfieldandacontentfield.Forexample(ignorequantitiespurchasedfornow),

TransactionID Items

Broccoli

1 GreenPeppers

1 Corn

2 Asparagus

2 Squash

2 Corn

3 Corn

3 Tomatoes

… …

Noteinthiscasethatasingletransactionrequiresseveralrecords.SASEM6.1requiresthisformat,unlessyouhaveitsdatawarehousingsoftware—whichwedonothave.

Inthetabulardataformat,eachrecordisatransaction(alsoignoringquantitiespurchasedfornow)andaflag(0/1orT/F)torepresentapurchaseornot.Forexample,

TransId

Asparagus

Beans

Broccoli

Corn

GreenPeppers

Squash

Tomatoes

1

0

0

1

1

1

0

0

2

1

0

0

1

0

1

0

3

0

1

0

1

0

1

1

n

1

1

0

0

1

0

1

Notethatthisdataformatcanbecomeverycumbersomeforalargenumberofproductsandalargenumberoftransactions—andwilltypicallybeaverysparsematrix.Thus,twoapproachesforminingalargenumberoftransactionswithalargenumberofproductsaregenerallytaken.

SQLwillbeusedtocreateatransactiondataformatfilethatwillbeusedforthemarketbasketassociationanalysis

Asoftwareproductwillbeusedthatdoesin-databasedatamining

IBMSPSSModeler14.2doesin-databaseminingviaODBCandwithdatabasevendors’products.Forexample,IBMSPSSModeler14.2canbeusedforin-databasedataminingforDB2andlikewiseworkswithSQLServerandOracle.

IBMSPSSModeler14.2forAssociationAnalysis

Aprori&CarmaModelinTabularFormat

TheAprorimodelwillbeillustratedfirst.PlaceanExcelnodeonthemodelcanvasasshownabove,opentheeditwindowandimporttheBaskets1n.xlsfile.ClicktheTypestabandclicktheReadValuesbutton.AddaTypenodeandconnecttheExcelnodetotheTypenode.RemembertoclicktheReadValuesbuttonontheTypetaboftheExcelnode.EdittheTypenodeanddothefollowing:

SettheCardIdvariableDirectiontoNone

SetalltheContinuousvariables’Directiontoinput

Setallthefoodcategorical(Flag)variables’toBoth

Seethesettingsbelow.

OpentheAproriBasketsnode—shownbelow.RecallthattheAprorimodelrequiresallvariablesbecategorical.

ThisexampleprovidesacustomnamesetintheAnnotationstab.IntheFieldstab,setthecategoricalvariablesthatarepossiblefortheConsequents.Notethatpmethod,sex,homeown,incomeandagewouldnotbeaconsequentbutcouldbeanantecedent.

ClicktheModeltab.Anumberofoptionsareavailableforthemodelertoadjustasappropriate.FirstistheMinimumantecedentsupport.IncreasingthisvaluewillresultinfewerrulesaswillincreasingtheMinimumruleconfidence.Inthiscase,theMaximumnumberofantecedentshasbeensetto5.ExecutetheApririBasketsnode.Double-clickthemodelnuggettoreviewtheresults.

FirstnotethattheConsequentandAntecedentcolumns.Thefirstrulesaysthatmalesthebuybeerandfrozenmealsalsobuycannedvegetables95.27%ofthetime.Supportforthisruleisalsodisplayed--14.8.NotethattherulesaresortedbytheConfidencecolumn—youmaywishtosortonadifferentcolumn.Also,theShow/HideCriteriaMenuallowsmoredatatobeshown.ThebottomtwomenuoptionsareShowallandHideall.SelecttheShowalloption.

Liftmeansthesamehereasinothermodels.ClicktheGeneratemenuoptionandselecttheRuleSet.

ThisexamplehasbeengiventheRulesetnameofAproriFrozenMeal,theTargetfieldissettofrozenmealandtheDefaultvaluehasbeensetto0.ClicktheOKbuttonandthegeneratedrulesetnodewillbeplacedontheupperleftofthestreamcanvas.DragthegeneratedmodeltherightoftheTypenodeandconnecttheTypenodetoit.

Openthegeneratedrulesetnode.Recallthatthissetofruleshasatargetfieldorvariableoffrozenmeal.Forthistarget,thereare8rules—rule1indicatesthatifonebuysbeer,thentheywillalsobuyfrozenmeals58%ofthetime.Locaterule8whichhasa94%probability.Malesthatbuybeerandcannedvegetableshavea94%probabilityofbuyingfrozenmeals.

Forconvenience,addaFilternodetotherightofthegeneratednodeandconnectthegeneratednodetothefilternode.Thepurposeofthefilternodeistoeliminatethefieldsnotusedinthe8generatedrulesThevariablesnotusedinthe8rulesare:value,pmethod,income,age,fruitveg,freshmeat,diary,canndemeat,wine,softdrin,fish,confectionery.Notethattwonewvariableshavebeencreatedatthebottomofthevariablelist--$A-11fieldsand$AC-11fields.Donotfilteroutthesetwovariablesorcardidascardidwillhelpfindrecordsinthedata.ConnecttheFilternodetoaTablenodeandexecutetheTablenode.

Asshownbelow,eachrecordinthedatasethasbeensettoTiftheconfidenceofafrozenmealinoneoftherulesisgreaterthan50%.TheT/Fisinthe($A-11fieldscolumn)andtheconfidenceisinthe$AC-11fieldscolumn.Forexample,row3(cardid=10872)hasaconfidence0.747.Canyougobacktotherulesetgeneratednodeandfindtherulethatmadethistrue?

CarmaModel

FortheCarmamodel,allinputsareconsideredtohavearoleofboth.Thus,onlythepurchasableitemsshouldbeincludedinthemodelastheinputsoftheFieldstab.OpentheCarmanode,clicktheFieldstab,selecttheUsecustomsettingsoptionandselecttheinputsforthemodel.

.

AcceptallotherdefaultsandruntheCarmanode.

DoubleclicktheCarmanuggettoviewtheresults.TheformatoftheoutputisidenticaltothatoftheApriorimodelsothedatapresentedthereneednotbeexplainedagain.AswiththeGRImodel,generateaRuleSet(usethesametargetvalueoffrozenmeal),dragittotherightoftheTypenodeandconnecttheTypenodetoit.Openthegeneratedrulesetnodeandnotethatthenumberofrules,three,islessthanintheApriorimodel.Therulewiththehighestconfidenceisthosethatbuybeerandcannedvegetableshavea87.4%probabilityofalsobuyingfrozenmeals.

FinishoutthestreambyaddingtheFilterandTablenodes.Remembertofilteroutallthevariablesnotusedintherulestomakeiteasiertoreadthetable.Aportionofthetablenodeisshownbelow.Record10872hasTthesameastheApriorimodelbutwithlessconfidence;.675inthiscase.

Basket1nData

Basketsummary:

cardid.Loyaltycardidentifierforcustomerpurchasingthisbasket.

value.Totalpurchasepriceofbasket.

pmethod.Methodofpaymentforbasket.

Personaldetailsofcardholder:

sex

homeown.Whetherornotcardholderisahomeowner.

income

age

Basketcontents—flagsforpresenceofproductcategories:

fruitveg

freshmeat

dairy

cannedveg

cannedmeat

frozenmeal

beer

wine

softdrink

fish

confectionery

AtransactionfileformatwillbeusedtoillustrateIBMSPSSModeler14.2–Apriori,CarmaandSequence.Thefile,GroceryTrans1-Time.xlscontainstransactiondatawithasequencecolumn,Time,asshownbelow.NoticethattheProductvariableRolehasbeensettoBoth.

RecallthatonlytheSequencemodelmakesutilizationofTimesoitwillnotbeusedintheAprioriandCarmaanalysis.CreateaIBMSPSSModeler14.2streamasshownbelow.

BecausetheTimefieldcannotbeusedbytheCarmaorApriorimodel,aTypenodeisusedtoremovethisfieldfromtheanalysis.Todothis,ensurethattheRolefortheTimevariableissettoNone.

Then,opentheCarmanodeandclicktheFieldstab.ThiswillrequiresettingtheIDvariableaswellastheContentvariable—thehowtodothismaynotobviousuntilyouclicktheUseCustomSettingsoption.ClicktheUsecustomsettingsoptionandthenclicktheUsetransactionformatcheckbox.

AfterclickingtheUsetransactionformatcheckbox,theCarmanodeproducesthedropdownboxestoallowtheusertoentertheIDfieldandtheContentfield.Inthisexample,theID

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论