模型估计与选择_第1页
模型估计与选择_第2页
模型估计与选择_第3页
模型估计与选择_第4页
模型估计与选择_第5页
已阅读5页,还剩56页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

AdditiveModels,Trees,andRelatedModelsProf.LiqingZhangDept.ComputerScience&Engineering,ShanghaiJiaotongUniversityIntroduction9.1:GeneralizedAdditiveModels9.2:Tree-BasedMethods9.3:PRIM:BumpHunting9.4:MARS:MultivariateAdaptiveRegressionSplines9.5:HME:HierarchicalMixtureofExperts9.1GeneralizedAdditiveModelsIntheregressionsetting,ageneralizedadditivemodelshastheform:

Here

sareunspecifiedsmoothandnonparametricfunctions.InsteadofusingLBE(LinearBasisExpansion)inchapter5,wefiteachfunctionusingascatterplotsmoother(e.g.acubicsmoothingspline)GAM(cont.)Fortwo-classclassification,theadditivelogisticregressionmodelis:

Here

GAM(cont)Ingeneral,theconditionalmeanU(x)ofaresponseyisrelatedtoanadditivefunctionofthepredictorsviaalinkfunctiong:

Examplesofclassicallinkfunctions:Identity:Logit:Probit:Log:FittingAdditiveModelsTheadditivemodelhastheform:HerewehaveGivenobservations,acriterionlikepenalizedsumsquarescanbespecifiedforthisproblem:

Wherearetuningparameters.FAM(cont.)Conclusions:ThesolutiontominimizePRSSiscubicsplines,howeverwithoutfurtherrestrictionsthesolutionisnotunique.Ifholds,itiseasytoseethat:

Ifinadditiontothisrestriction,thematrixofinputvalueshasfullcolumnrank,then(9.7)isastrictconvexcriterionandhasanuniquesolution.Ifthematrixissingular,thenthelinearpartoffjcannotbeuniquelydetermined.(Buja1989)LearningGAM:BackfittingBackfittingalgorithmInitialize:Cycle: j=1,2,…,p,…,1,2,…,p,…,(mcycles)

UntilthefunctionschangelessthanaprespecifiedthresholdBackfitting:PointstoPonderComputationalAdvantage?Convergence?Howtochoosefittingfunctions?FAM(cont.)Initialize:Cycle:untilthefunctionschangelessthanaprespecifiedthreshold.Algorithm9.1TheBackfittingAlgorithmforAdditiveModels.AdditiveLogisticRegression2024/12/711ThegeneralizedadditivelogisticmodelhastheformThefunctionsareestimatedbyabackfittingwithinaNewton-Raphsonprocedure.LogisticRegressionModeltheclassposteriorintermsofK-1log-oddsDecisionboundaryissetofpointsLineardiscriminantfunctionforclasskClassifytotheclasswiththelargestvalueforits

k(x)LogisticRegressioncon’tParametersestimationObjectivefunctionParametersestimationIRLS(iterativelyreweightedleastsquares)Particularly,fortwo-classcase,usingNewton-Raphsonalgorithmtosolvetheequation,theobjectivefunction:LogisticRegressioncon’tLogisticRegressioncon’tLogisticRegressioncon’tLogisticRegressioncon’tLogisticRegressioncon’tFeatureselectionFindasubsetofthevariablesthataresufficientforexplainingtheirjointeffectontheresponse.Onewayistorepeatedlydroptheleastsignificantcoefficient,andrefitthemodeluntilnofurthertermscanbedroppedAnotherstrategyistorefiteachmodelwithonevariableremoved,andperformananalysis

ofdeviancetodecidewhichonevariabletoexcludeRegularizationMaximumpenalizedlikelihoodShrinkingtheparametersviaanL1constraint,imposingamarginconstraintintheseparablecaseFittinglogisticregressionFittingadditivelogisticregression1.2.Iterate:Usingweightedleastsquarestofitalinearmodeltoziwithweightswi,givenewestimates3.Continuestep2until converge1. where2.Iterate:b.a.a.c.c.Usingweightedbackfittingalgorithmtofitanadditivemodeltoziwithweightswi,givenewestimatesb.3.Continuestep2untilconvergeAdditiveLogisticRegression:BackfittingAdditiveLogisticRegressionComputestartingvalues:,where,thesampleproportionofones,andset.Defineand.Iterate:ConstructtheworkingtargetvariableConstructweightsFitanadditivemodeltothetargetsziwithweightswi,usingaweightedbackfittingalgorithm.ThisgivesnewestimatesContinuestep2.untilthechangeinthefunctionsfallsbelowaprespecifiedthreshold.Algorithm9.2LocalScoringAlgorithmfortheAdditiveLogisticRegressionModel.SPAMDetectionviaAdditiveLogisticRegressionInputvariables(predictors):48quantitativepredictors—thepercentageofwordsintheemailthatmatchagivenword.Examplesincludebusiness,address,internet,free,andgeorge.Theideawasthatthesecouldbecustomizedforindividualusers.6quantitativepredictors—thepercentageofcharactersintheemailthatmatchagivencharacter.Thecharactersarech;,ch(,ch[,ch!,ch$,andch#.Theaveragelengthofuninterruptedsequencesofcapitalletters:CAPAVE.Thelengthofthelongestuninterruptedsequenceofcapitalletters:CAPMAX.Thesumofthelengthofuninterruptedsequencesofcapitalletters:CAPTOT.Outputvariable:SPAM(1)orEmail(0)fj’saretakenascubicsmoothingsplines2024/12/7AdditiveModels222024/12/7AdditiveModels232024/12/7AdditiveModels24SPAMDetection:ResultsTrueClassPredictedClassEmail(0)SPAM(1)Email(0)58.5%2.5%SPAM(1)2.7%36.2%Sensitivity:Probabilityofpredictingspamgiventruestateisspam=Specificity:Probabilityofpredictingemailgiventruestateisemail=GAM:SummaryUsefulflexibleextensionsoflinearmodelsBackfittingalgorithmissimpleandmodularInterpretabilityofthepredictors(inputvariables)arenotobscuredNotsuitableforverylargedataminingapplications(why?)Introduction9.1:GeneralizedAdditiveModels9.2:Tree-BasedMethods9.3:PRIM:BumpHunting9.4:MARS:MultivariateAdaptiveRegressionSplines9.5:HME:HierarchicalMixtureofExperts9.2Tree-BasedMethodBackground:Tree-basedmodelspartitionthefeaturespaceintoasetofrectangles,andthenfitasimplemodel(likeaconstant)ineachone.Apopularmethodfortree-basedregressionandclassificationiscalledCART(classificationandregressiontree)CARTCARTExample:Let’sconsideraregressionproblem:continuousresponseYinputsX1andX2.Tosimplifymatters,weconsiderthepartitionshownbythetoprightpaneloffigure.ThecorrespondingregressionmodelpredictsYwithaconstantCminregionRm:Forillustration,wechooseC1=-5,C2=-7,C3=0,C4=2,C5=4inthebottomrightpanelinfigure9.2.RegressionTreeSupposewehaveapartitionintoMregions:R1R2…..RM.WemodeltheresponseYwithaconstantCmineachregion:

IfweadoptasourcriterionminimizationofRSS,itiseasytoseethat:RegressionTree(cont.)FindingthebestbinarypartitionintermofminimumRSSiscomputationallyinfeasibleAgreedyalgorithmisused.

HereRegressionTree(cont.)Foranychoicejands,theinnerminimizationissolvedby:ForeachsplittingvariableXj

thedeterminationofsplitpointscanbedoneveryquicklyandhencebyscanningthroughalloftheinput,determinationofthebestpair(j,s)isfeasible.Havingfoundthebestsplit,wepartitionthedataintotworegionsandrepeatthesplittingprogressineachofthetworegions.RegressionTree(cont.)Weindexterminalnodesbym,withnodemrepresentingregionRm.Let|T|denotesthenumberofterminalnotesinT.Letting:Wedefinethecostcomplexitycriterion:ClassificationTree:theproportionofclasskonmode:themajorityclassonnodemInsteadofQm(T)definedin(9.15)inregression,wehavedifferentmeasuresQm(T)ofnodeimpurityincludingthefollowing:MisclassificationError:GiniIndex:Cross-entropy(deviance):ClassificationTree(cont.)Example:Fortwoclasses,ifpistheproportionofinthesecondclass,thesethreemeasuresare:

1-max(p,1-p),2p(1-p),-plog(p)-(1-p)log(1-p)2024/12/7AdditiveModels372024/12/7AdditiveModels38Introduction9.1:GeneralizedAdditiveModels9.2:Tree-BasedMethods9.3:PRIM:BumpHunting9.4:MARS:MultivariateAdaptiveRegressionSplines9.5:HME:HierarchicalMixtureofExperts9.3PRIM:BumpHuntingThepatientruleinductionmethod(PRIM)findsboxesinthefeaturespaceandseeksboxesinwhichtheresponseaverageishigh.Henceitlooksformaximainthetargetfunction,anexerciseknownasbumphunting.PRIM(cont.)PRIM(cont.)PRIM(cont.)Introduction9.1:GeneralizedAdditiveModels9.2:Tree-BasedMethods9.3:PRIM:BumpHunting9.4:MARS:MultivariateAdaptiveRegressionSplines9.5:HME:HierarchicalMixtureofExpertsMARS:MultivariateAdaptiveRegressionSplinesMARSusesexpansionsinpiecewiselinearbasisfunctionsoftheform:(x-t)+and(t-x)+.Wecallthetwofunctionsareflectedpair.MARS(cont.)TheideaofMARSistoformreflectedpairsforeachinputXjwithknotsateachobservedvalueXij

ofthatinput.Therefore,thecollectionofbasisfunctionis:Themodelhastheform:whereeachhm(x)isafunctioninCoraproductoftwoormoresuchfunctions.MARS(cont.)Westartwithonlytheconstantfunctionh0(x)=1inthemodelsetMandallfunctionsinthesetCarecandidatefunctions.AteachstageweaddtothemodelsetMthetermoftheform:

thatproducesthelargestdecreaseintrainingerror.TheprocessiscontinueduntilthemodelsetMcontainssomepresetmaximumnumberofterms.MARS(cont.)MARS(cont.)MARS(cont.)Thisprogresstypicallyoverfitsthedataandsoabackwarddeletionprocedureisapplied.ThetermwhoseremovalcausesthesmallestincreaseinRSSisdeletedfromthemodelateachstep,producinganestimatedbestmodelofeachsize.GeneralizedcrossvalidationisappliedtoestimatetheoptimalvalueofThevalueM(λ)istheeffectivenumberofparametersinthemodel.Introduction9.1:GeneralizedAdditiveModels9.2:Tree-BasedMethods9.3:PRIM:BumpHunting9.4:MARS:MultivariateAdaptiveRegressionSplines9.5:HME:HierarchicalMixtureofExpertsHierarchicalMixturesofExpertsTheHMEmethodcanbeviewedasavariantofthetree-basedmethods.Difference:Themaindifferenceisthatthetreesplitsarenotharddecisionsbutrathersoftprobabilisticones.InanHMEalinear(orlogisticregression)modelisfittedineachterminalnode,insteadofaconstantasintheCART.HME(cont.)Asimpletwo-levelHMEmodelisshowninFigure.Itcanbeviewedasatreewithsoftsplitsateachnon-terminalnode.HME(cont.)Theterminalnodeiscalledexpertandthenon-terminalnodeiscalledgatingnetworks.TheideaeachexpertprovidesapredictionabouttheresponseY,thesearecombinedtogetherbythegatingnetworks.

HME(cont.)Thetopgatingnetworkhastheoutput:whereeachisavectorofunknownparameters.Thisrepres-entsasoftK-waysplit(K=2inFigure9.13.)Eachistheprobabilityofassigninganobservationwithfeaturevectorxtothejthbranch.HME(cont.)Atthesecondlevel,thegatingnetworkshaveasimilarform:Attheexperts,wehaveamodelfortheresponsevariableoftheform:Thisdiffersaccordingtotheproblem.HME(cont.)Reg

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论