版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1MLestimationofexponentialmodel(10)AGaussiandistributionisoftenusedtomodeldataontherealline,butissometimesinappropriatewhenthedataareoftenclosetozerobutconstrainedtobenonnegative.Insuchcasesonecanfitanexponentialdistribution,whoseprobabilitydensityfunctionisgivenbybp1exbbGivenNobservationsxidrawnfromsuchadistribution:Writedownthelikelihoodasafunctionofthescaleparameterb.Writedownthederivativeoftheloglikelihood.GiveasimpleexpressionfortheMLestimateforb.2Poissonpe,y,2,...x!lN
gpx
xloglogx!i1
i ii1Ni1
xgNNiii
logx!i1p1-p11mm:p2Conjugatepriors
pknown,correctp
ppp1mGivenalikelihood
p
foraclassmodelswithparameters,aconjugatepriorisadistribution
p
withhyperparametersγ,suchthattheposteriordistributionp|X,pXpp与先验族相同Supposethatthelikelihoodisgivenbytheexponentialdistributionwithrateparameterλ:px|exShowthatthegammadistribution e_ 1isaconjugatepriorfortheexponential.Derivetheparameterupdategivenobservations
x,K,
andthepredictiondistribution
1 Npx ,K,x.N1 NShowthatthebetadistributionisaconjugatepriorforthegeometricdistributionpxk1k1whichdescribesthenumberoftimeacoinistosseduntilthefirstheadsappears,whentheprobabilityofheadsoneachtossisθ.Derivetheparameterupdateruleandpredictiondistribution.
p
isaconjugatepriorforthelikelihood
px|;showthatthemixturepriorp
,...,
wp|1 M m m1isalsoconjugateforthesamelikelihood,assumingthemixtureweightswmsumto1.Repeatpart(c)forthecasewherethepriorisasingledistributionandthelikelihoodisamixture,andthepriorisconjugateforeachmixturecomponentofthelikelihood.somepriorscanbeconjugateforseveraldifferentlikelihoods;forexample,thebetaisconjugatefortheBernoulliandthegeometricdistributionsandthegammaisconjugatefortheexponentialandforthegammawithfixedα(Extracredit,20)Explorethecasewherethelikelihoodisamixturewithfixedcomponentsandunknownweights;.,theweightsaretheparameterstobelearned.、判断题nn的增加而减小。(3)回归函数A和B,如果A比B更简单,则A几乎一定会比B在测试集上表现更好。(5)Boosting和Bagging都是组合多个分类器投票的方法,二者都是根据单个分类器的正确率决定其权重。(6)Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorofthecombinedclassifiervaryroughlyinconcert(F)Whilethetrainingerrorofthecombinedclassifierasafunctionofboostingiterations,theerroroftheindividualdecisionstumpssincetheweightsbecomeconcentratedatthemostdifficult(7)OneadvantageofBoostingisthatitdoesnotoverfit.(F)()Supportvectormachinesareresistanttooutliers,.,verynoisyeamplesdrawnfromadifferentdistribution.()在回归分集以数多计算大Lasso计算Lasso以现。训练数据更合。降有会陷于局部极小值,但 EM算法不会。。IntheAdaBoostalgorithm,theweightsonallthemisclassifiedpointswillgoupbythesamemultiplicativefactor.T)True/False:Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltycannotdecreasethe2errorofthesolutionw?onthetrainingdata. )True/False:Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltyalwaysdecreasestheexpected2errorofthesolutionw?onunseentestdata .除了EM算法梯下降也可求混高斯模型。 (T)Anydecisionboundarythatwegetfromagenerativemodelwithclass-conditionalGaussiandistributionscouldinprinciplebereproducedwithanSVMandapolynomialkernel.True!Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecisionboundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanorequaltotwo.AdaBoostwilleventuallyreachtrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeencombined.False!Ifthedataisnotseparablebyalinearcombinationoftheweakclassifiers,AdaBoostcan’tachievezerotrainingerror.The2ynaenstoaerne.)Thelog-likelihoodofthedatawillalwaysincreasethroughsuccessiveiterationsoftheexpectationmaximationalgorithm.(F)Intrainingalogisticregressionmodelbythelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions. 归1Clomeanlog-probabilit10“随着C增加2log永远会增加”是否确并明理由。解释C较大2log降原因。 2线性模型:y~N
w
据如所示100 1用极大估计并a画模型3
C 用极大估计即log目标加入 w2 ,2 1并b画C很大模型3后高斯布方差2是变大变小还是变?4 (a) (b)3.xxx1 2
Txj
j1,2为 y~N 10xx7x25x
1-101 2 12 1 2xy最小最大最小1X2X8X10Xn1、2、810然后一大规独立集则下3列选择合适可能选最小最大最小1X2X8X10Xn1061、2、810然后一大规独1X28XX10X1X28XX10XTheapproximationerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.(T)Thestructuralerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.(F)4Wearetryingtolearnregressionparametersforadatasetwhichweknowwasgeneratedfromapolynomialofacertaindegree,butwedonotknowwhatthisdegreeis.Assumethedatawasactuallygeneratedfromapolynomialofdegree5withsomeaddedGaussiannoise(thatisywwwwww, ~N0 1 2 3 4 5{ x}{x}Aa4Batestdatabetter?Answer:Degree6polynomial.Sincethemodelisadegree5polynomialandwehaveenoughtrainingdata,themodelwelearnforasixdegreepolynomialwilllikelyfitaverysmallcoefficientfor.Thus,eventhoughitisasixdegreepolynomialitwillactuallybehaveinaverysimilarwaytoafifthdegreepolynomialwhichisthecorrectmodelleadingtobetterfittothedata.5Input-dependentnoiseinregressionOrdinaryleast-squaresregressionisequivalenttoassumingthateachdatapointisgeneratedaccordingtoalinearfunctionoftheinputpluszero-mean,constant-varianceGaussiannoise.Inmanysystems,however,thenoisevarianceisitselfapositivelinearfunctionoftheinput(whichisassumedtobenon-negative,.,x>=0).Whichofthefollowingfamiliesofprobabilitymodelscorrectlydescribesthissituationintheunivariatecase?(Hint:onlyoneofthemdoes.)(iii)iscorrect.InaGaussiandistributionovery,thevarianceisdeterminedbythecoefficientof;sobyreplacing
2
x2,wegetavariancethatincreaseslinearlywithx.(Notealsothechangetothenormalization“constant.”)(i)hasquadraticdependenceonx;(ii)doesnotchangethevarianceatall,itjustrenamesw1.CircletheplotsinFigure1thatcouldplausiblyhavebeengeneratedbysomeinstanceofthemodelfamily(ies)youchose.(ii)and(iii).(Notethat(iii)worksvarianceappearsindependentofx.
20.)(i)exhibitsalargevarianceatx=0,andtheTrue/False:Regressionwithinput-dependentnoisegivesthesamesolutionasordinaryregressionforaninfinitedatasetgeneratedaccordingtothecorrespondingmodel.True.Inbothcasesthealgorithmwillrecoverthetrueunderlyingmodel.Forthemodelyouchoseinpart(a),writedownthederivativeofthenegativeloglikelihoodwithrespectto、分类1. 产生式模型 vs.判别式模型Yourbillionairefriendneedsyourhelp.Sheneedstoclassifyjobapplicationsintogood/badcategories,andalsotodetectjobapplicantswholieintheirapplicationsusingdensityestimationtodetectoutliers.Tomeettheseneeds,doyourecommendusingadiscriminativeorgenerativeclassifier?Why?产生式模型p|yYourbillionairefriendalsowantstoclassifysoftwareapplicationstodetectbug-proneapplicationsusingfeaturesofthesourcecode.Thispilotprojectonlyhasafewapplicationstobeusedastrainingdata,though.Tocreatethemostaccurateclassifier,doyourecommendusingadiscriminativeorgenerativeclassifier?Why?判别式模型样本数较少,通常用判别式模型直接分类效果会好些(d)Finally,yourbillionairefriendalsowantstoclassifycompaniestodecidewhichonetoacquire.Thisprojecthaslotsoftrainingdatabasedonseveraldecadesofresearch.Tocreatethemostaccurateclassifier,doyourecommendusingadiscriminativeorgenerativeclassifier?Why?产生式模型样本数很多时,可以学习到正确的产生式模型2、logstic回归Figure2:Log-probabilityoflabelsasafunctionofregularizationparameterCHereweusealogisticregressionmodeltosolveaclassificationproblem.InFigure2,wehaveplottedthemeanlog-probabilityoflabelsinthetrainingandtestsetsafterhavingtrainedtheclassifierwithquadraticregularizationpenaltyanddifferentvaluesoftheregularizationparameterC.1Intrainingalogisticregressionmodelbymaximizingthelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions.(F)Answer:Thelog-probabilityoflabelsgivenexamplesimpliedbythelogisticregressionmodelisaconcave(convexdown)functionwithrespecttotheweights.The(only)locallyoptimalsolutionisalsogloballyoptimal2、 Astochasticgradientalgorithmfortraininglogisticregressionmodelswithafixedlearningratewillfindtheoptimalsettingoftheweightsexactly.(F)Answer:Afixedlearningratemeansthatwearealwaystakingafinitesteptowardsimprovingthelog-probabilityofanysingletrainingexampleintheupdateequation.Unlesstheexamplesaresomehow“aligned”,wewillcontinuejumpingfromsidetosideoftheoptimalsolution,andwillnotbeabletogetarbitrarilyclosetoit.Thelearningratehastoapproachtointhecourseoftheupdatesfortheweightstoconverge.3、Theaveragelog-probabilityoftraininglabelsasinFigure2canneverincreaseasweincreaseC.(T)Strongerregularizationmeansmoreconstraintsonthesolutionandthusthe(average)log-probabilityofthetrainingexamplescanonlygetworse.4、ExplainwhyinFigure2thetestlog-probabilityoflabelsdecreasesforlargevaluesofC.AsCincreases,wegivemoreweighttoconstrainingthepredictor,andthusgivelessflexibilitytofittingthetrainingset.Theincreasedregularizationguaranteesthatthetestperformancegetsclosertothetrainingperformance,butasweover-constrainourallowedpredictors,wearenotabletofitthetrainingsetatall,andalthoughthetestperformanceisnowveryclosetothetrainingperformance,botharelow.5Thelog-probabilityoflabelsinthetestsetwoulddecreaseforlargevaluesofCevenifwehadalargenumberoftrainingexamples.(T)Theaboveargumentstillholds,butthevalueofCforwhichwewillobservesuchadecreasewillscaleupwiththenumberofexamples.6Addingaquadraticregularizationpenaltyfortheparameterswhenestimatingalogisticregressionmodelensuresthatsomeoftheparameters(weightsassociatedwiththecomponentsoftheinputvectors)vanish.Aregularizationpenaltyforfeatureselectionmusthavenon-zeroderivativeatOtherwise,theregularizationhasnoeffectatandweightwilltendtobeslightlynon-zero,evenwhenthisdoesnotimprovethelog-probabilitiesbymuch.3、正则化的Logstic回归ThisproblemwewillrefertothebinaryclassificationtaskdepictedinFigure1(a),whichweattempttosolvewiththesimplelinearlogisticregressionmodel(forsimplicitywedonotusethebiasparameterw0).Thetrainingdatacanbeseparatedwithtrainingerror-seelineL1inFigure1(b)forinstance.(a)The(a)The2-dimensionaldatasetusedinProblem2(b)ThepointscanbeseparatedbyL1(solidline).PossibleotherdecisionboundariesareshownbyL2;L3;L4.ConsideraregularizationapproachwherewetrytoforlargeC.Notethatonlyw2ispenalized.liketoknowwhichofthefourlinesinFigure1(b)couldariseasaresultofsuchregularization.ForeachpotentiallineL2,L3orL4determinewhetheritcanresultfromregularizingw2.Ifnot,explainverybrieflywhynot.L2:No.Whenweregularizew2,theresultingboundarycanrelylessonthevalueofx2andthereforebecomesmorevertical.L2hereseemstobemorehorizontalthantheunregularizedsolutionsoitcannotcomeasaresultofpenalizingw2L3:Yes.Herew2^2issmallrelativetow1^2(asevidencedbyhighslope),andeventhoughitwouldassignaratherlowlog-probabilitytotheobservedlabels,itcouldbeforcedbyalargeregularizationparameterC.L4:No.ForverylargeC,wegetaboundarythatisentirelyvertical(linex1=0orthex2axis).hereisreflectedacrossthex2axisandrepresentsapoorersolutionthancounterpartontheotherside.Formoderateregularizationwehavetogetthebestsolutionthatwecanconstructwhilekeepingw2small.L4isnotthebestandthuscannotcomeasaresultofregularizingw2.Ifwechangetheformofregularizationtoone-norm(absolutevalue)andalsoregularizew1wegetthefollowingpenalizedlog-likelihoodConsideragaintheprobleminFigure1(a)andthesamelinearlogisticregressionmodel.AsweincreasetheregularizationparameterCwhichofthefollowingscenariosdoyouexpecttoobserve(chooseonlyone):(x)Firstw1willbecome0,thenw2.()w1andw2willbecomesimultaneously()Firstw2willbecome0,thenw1.()Noneoftheweightswillbecomeexactlyzero,onlysmallerasCincreasesThedatacanbeclassifiedwithtrainingerrorandthereforealsowithhighlog-probabilitybylookingatthevalueofx2alone,.makingw1=0.Initiallywemightprefertohaveanon-zerovalueforw1butitwillgotoratherquicklyasweincreaseregularization.Notethatwepayaregularizationpenaltyforanon-zerovalueofw1andifithelpclassificationwhywouldwepaythepenalty?Theabsolutevalueregularizationensuresthatw1willindeedgotoexactlyAsCincreasesfurther,evenw2willeventuallybecomeWepayhigherandhighercostforsettingw2toanon-zerovalue.Eventuallythiscostoverwhelmsthegainfromthelog-probabilityoflabelsthatwecanachievewithanon-zerow2.Notethatwhenw1=w2=0,thelog-probabilityoflabelsisafinitevaluenlog(0:5).1、 SVMFigure4:Trainingset,maximummarginlinearseparator,andthesupportvectors(inbold).Whatistheleave-one-outcross-validationerrorestimateformaximummarginseparationinfigure4?(weareaskingforanumber)(0)Basedonthefigurewecanseethatremovinganysinglepointwouldnotchancetheresultingmaximummarginseparator.Sinceallthepointsareinitiallyclassifiedcorrectly,theleave-one-outerrorisWewouldexpectthesupportvectorstoremainthesameingeneralaswemovefromalinearkerneltohigherorderpolynomialkernels.(F)Therearenoguaranteesthatthesupportvectorsremainthesame.Thefeaturevectorscorrespondingtopolynomialkernelsarenon-linearfunctionsoftheoriginalinputvectorsandthusthesupportpointsformaximummarginseparationinthefeaturespacecanbequitedifferent.Structuralriskminimizationisguaranteedtofindthemodel(amongthoseconsidered)withthelowestexpectedloss.F)Weareguaranteedtofindonlythemodelwiththelowestupperboundontheexpectedloss.WhatistheVC-dimensionofamixtureoftwoGaussiansmodelintheplanewithequalcovariancematrices?Why?AmixtureoftwoGaussianswithequalcovariancematriceshasalineardecisionboundary.LinearseparatorsintheplanehaveVC-dimexactly3.4、SVM对如下数据点进行分类:Plotthesesixtrainingpoints.Aretheclasses{+,?}linearlyseparable?yesConstructtheweightvectorofthemaximummarginhyperplanebyinspectionandidentifythesupportvectors.Themaximummarginhyperplaneshouldhaveaslopeof?1andshouldsatisfyx1=3/2,x2=0.Thereforeit’sequationisx1+x2=3/2,andtheweightvectoris(1,1)T.Ifyouremoveoneofthesupportvectorsdoesthesizeoftheoptimalmargindecrease,staythesame,orincrease?Inthisspecificdatasettheoptimalmarginincreaseswhenweremovethesupportvectors(1,0)or(1,1)andstaysthesamewhenweremovetheothertwo.(ExtraCredit)Isyouranswerto(c)alsotrueforanydataset?Provideacounterexampleorgiveashortproof.Whenwedropsomeconstraintsinaconstrainedmaximizationproblem,wegetanoptimalvaluewhichisatleastasgoodthepreviousone.Itisbecausethesetofcandidatessatisfyingtheoriginal(larger,stronger)setofcontraintsisasubsetofthecandidatessatisfyingthenew(smaller,weaker)setofconstraints.So,fortheweakerconstraints,theoldoptimalsolutionisstillavailableandtheremaybeadditionssoltonsthatareevenbetter.Inmathematicalform:Finally,notethatinSVMproblemswearemaximizingthemarginsubjecttotheconstraintsgivenbytrainingpoints.Whenwedropanyoftheconstraintsthemargincanincreaseorstaythesamedependingonthedataset.Ingeneralproblemswithrealisticdatasetsitisexpectedthatthemarginincreaseswhenwedropsupportvectors.Thedatainthisproblemisconstructedtodemonstratethatwhenremovingsomeconstraintsthemargincanstaythesameorincreasedependingonthegeometry.2、SVM对下述有3个数据点的集合进行分类:Aretheclasses{+,?}linearlyseparable?No。Considermappingeachpointto3-Dusingnewfeaturevectors
x2x,x2classesnowlinearlyseparable?Ifso,findaseparatinghyperplane.,,
2,1
respectively.Thepointsarenowseparablein3-dimensionalspace.Aseparatinghyperplaneisgivenbytheweightvector(0,0,1)inthenewspaceasseeninthefigure.Defineaclassvariableyi2{?1,+1}whichdenotestheclassofxiandletw=(w1,w2,w3)T.Themax-marginSVMclassifiersolvesthefollowingproblemUsingthemethodofLagrangemultipliersshowthatthesolutionis
ˆ0,0,2
,b
and1ˆthemarginˆ2Foroptimizationproblemswithinequalityconstraintssuchastheabove,weshouldapplyKKTconditionswhichisageneralizationofLagrangemultipliers.Howeverthisproblemcanbesolvedeasierbynotingthatwehavethreevectorsinthe3-dimensionalspaceandallofthemaresupportvectors.Hencetheall3constraintsholdwithequality.ThereforewecanapplythemethodofLagrangemultipliersto,Showthatthesolutionremainsthesameiftheconstraintsarechangedtoforany
1.Credit)Isanswerto(d)alsotrueforanydatasetand1?Provideaorgiveashortproof.SVMSupposeweonlyhavefourtrainingexamplesintwodimensions(seefigureabove):positiveat=[0,0],=[2,2]andnegativeat=[h,1],=[0,3],wherewetreat0≤h≤3asaparameterHowlargecanh≥0besothatthetrainingpointsarestilllinearlyseparable?Uptoh=1Doestheorientationofthemargindecisionboundarychangeasafunctionofhwhenthepointsareseparable(Y/N)?No,becausex1,x2,x3remainthesupportvectors.Whatisthemarginachievedbythemarginboundaryasafunctionofh?[Hint:Itturnsoutthatthemarginasafunctionofhisalinearfunction.]Assumethatwecanonlyobservethesecondcomponentoftheinputvectors.Withouttheothercomponent,thelabeledtrainingpointsreduceto(0,y=1),(2,y=1),(1,y=-1),and(3,y=-1).Whatisthelowestorderpofkernelthatwouldallowustocorrectlyclassifythesepoints?Theclassesofthepointsonthelineobservetheorder1,-1,1,-1.Therefore,weneedacubic3、LDAUsingasetof100labeledtrainingexamples(twoclasses),wetrainthefollowingmodels:GaussI:AGaussianmodel(oneGaussianperclass),wherethecovariancematricesarebothsettoIGaussX:AGaussianmodel(oneGaussianperclass)withoutanyrestrictionsonthecovariancematrices.LinLog:Alogisticregressionmodelwithlinearfeatures.QuadLog:Alogisticregressionmodel,usingalllinearandquadraticfeatures.Aftertraining,wemeasureforeachmodeltheaveragelogprobabilityoflabelsgiveninthetrainingset.Specifyalltheequalitiesorinequalitiesthatmustholdbetweenthemodelsrelativetothisperformancemeasure.Wearelookingforstatementslike“model1<=model2”or“model1=model2”.Ifnosuchstatementholds,write“none”.GaussI<=LinLog(bothhavelogisticpostiriors,andLinLogisthelogisticmodelmaximizingtheaveragelogprobabilities)GaussX<=QuadLog(bothhavelogisticpostiriorswithquadraticfeatures,andQuadLogisthemodelofthisclassmaximizingtheaveragelogprobabilities)LinLog<=QuadLog(logisticregressionmodelswithlinearfeaturesareasubclassoflogisticregressionmodelswithquadraticfunctions—thefromthesuperclassisatleastashighasthefromthesubclass)GaussI<=QuadLog(followsfromaboveinequalities)(GaussXwillhavehigheraveragelogjointprobabilitiesofandlabels,thenwillGaussI.Buthavehigheraveragelogjointprobabilitiesdoesnotnecessarilytranslatetohigheraveragelogconditionalprobabilities)Whichequalitiesandinequalitiesmustalwaysholdifweinsteadusethemeanclassificationerrorinthetrainingsetastheperformancemeasure?Againusetheformat“model1<=model2”or“model1=model2”.Write“none”ifnosuchstatementholds.None.Havinghigheraveragelogconditionalprobabilities,oraveragelogjointprobabilities,doesnotnecessarilytranslatetohigherorlowerclassificationerror.canbeconstructedforallpairsinbothdirections.Althoughthereisnoinequalitieswhichiscorrect,itiscommonlythecasethatGaussX<=GaussIandthatQuadLog<=LinLog.Partialcreditofuptotwopointswasawardedfortheseinequalities.5、WeconsiderheregenerativeanddiscriminativeapproachesforsolvingtheclassificationproblemillustratedinFigureSpecifically,wewilluseamixtureofGaussiansmodelandregularizedlogisticregressionmodels.Figure.Labeledtrainingset,where“+”correspondstoclassy=1.WewillfirstestimateamixtureofGaussiansmodel,oneGaussianperclass,withtheconstraintthatthecovariancematricesareidentitymatrices.Themixingproportions(classfrequencies)andthemeansofthetwoGaussiansarefreeparameters.PlotthemaximumlikelihoodestimatesofthemeansofthetwoclassconditionalGaussiansinFigure.Markthemeansaspoints“x”andlabelthem“0”and“1”accordingtotheclass.Themeansshouldbeclosetothecenterofmassofthepoints.Drawthedecisionboundaryinthesamefigure.Sincethetwoclasseshavethesamenumberofpointsandthesamecovariancematrices,thedecisionboundaryisalineand,moreover,shouldbedrawnastheorthogonalbisectorofthelinesegmentconnectingtheclassmeans.Wehavealsotrainedregularizedlinearlogisticregressionmodelsforthesamedata.Theregularizationpenalties,usedinpenalizedconditionalloglikelihoodestimation,were
-Cw2,wherei=0,1,2.Inotherwords,onlyoneoftheparameterswereregularizedineachicase.BasedonthedatainFigure,wegeneratedthreeplots,oneforeachregularizedparameter,ofthenumberofmisclassifiedtrainingpointsasafunctionofC(Figure.Thethreeplotsarenotidentifiedwiththecorrespondingparameters,however.Pleaseassignthe“top”,“middle”,and“bottom”plotstothecorrectparameter,w0,w1,orw2,theparameterthatwasregularizedintheplot.Provideabriefjustificationforeachassignment.?“top”=(w1)Bystronglyregularizingw1weforcetheboundarytobehorizontalinthefigure.Thelogisticregressionmodeltriestomaximizethelog-probabilityofclassifyingthedatacorrectly.Thehighestpenaltycomesfromthemisclassifiedpointsandthustheboundarywilltendtobalancethe(worst)errors.Inthefigure,thisisroughlyspeakingx2=1line,resultingin4errors.? “middle”=Ifweregularizew0,thentheboundarywilleventuallygothroughtheorigin(biastermsettoBasedonthefigurewecanfindagoodlinearboundarythroughtheoriginwithonlyoneerror.?“bottom”=(w2)Thetrainingerrorisunaffectedifweregularizew2(constraintheboundarytobevertical);thevalueofw2wouldbesmallalreadywithoutregularization.4、midterm2009problem46、Considertwoclassifiers:1)anSVMwithaquadratic(secondorderpolynomial)kernelfunctionand2)anunconstrainedmixtureoftwoGaussiansmodel,oneGaussianperclasslabel.TheseclassifierstrytomapexamplesinR2tobinarylabels.Weassumethattheproblemisseparable,noslackpenaltiesareaddedtotheSVMclassifier,andthatwehavesufficientlymanytrainingexamplestoestimatethecovariancematricesofthetwoGaussiancomponents.ThetwoclassifiershavethesameVC-dimension.(T)Supposeweevaluatedthestructuralriskminimizationscoreforthetwoclassifiers.Thescoreistheboundontheexpectedlossoftheclassifier,whentheclassifierisestimatedonthebasisofntrainingexamples.Whichofthetwoclassifiersmightyieldthebetter(lower)score?Provideabriefjustification.TheSVMwouldprobablygetabetterscore.BothclassifiershavethesamecomplexitypenaltybutSVMwouldbetteroptimizethetrainingerrorresultinginalower(orequal)overallscore.[final2004]2,WeestimatedamixtureoftwoGaussiansmodelbasedontwodimensionaldatashowninfigurebelow.Themixturewasinitializedrandomlyintwodifferentwaysandrunforthreeiterationsbasedoneachinitialization.However,thefiguresgotmixedup(yes,again!).Pleasedrawanarrowfromonefiguretoanothertoindicatehowtheyfollowfromeachother(youshoulddrawonlyfourarrows).Figure:mixturemodelwithEM,twoinitializations,threeiterationsforeachWealsowantedtotryanothertwomodelsbasedonthesamenobservationsasin:Youcanassumethattheparametersareunconstrainedtotheextentpossible.S).Howmuchhigherlog-likelihoodwouldModel2havetoassigntothetrainingdataforustoselectthismodelwiththeBayesianInformationCriterion(BIC)?Model2hasonemoreparameterthanModel1.Thusthelog-likelihoodofModel2wouldhavetoovercometheadditionalcomplexitypenalty1/2log(n)ithasintheBICcriterion.Boosting1、Figure2:h1ischosenatthefirstiterationofboosting;whatistheweighta assignedtoit?1Figure2showsadatasetof8points,equallydividedamongthetwoclasses(positiveandnegative).Thefigurealsoshowsaparticularchoiceofdecisionstumph1pickedbyAdaBoostinthefirstiteration.Whatistheweight a1
thatwillbeassignedtoh1byAdaBoost?(Initialweightsofallthedatapointsareequal,or1/8.)Theweightedtrainingerrore is1/8–
a =1log1- e1 2 2 e
12
=18218
log727AdaBoostwilleventuallyreachtrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeencombined.F)Notifthedatainthetrainingsetcannotbeseparatedbyalinearcombinationofthespecificofweakclassifiersweareusing.Thevotesai
assignedtotheweakclassifiersinboostinggenerallygodownasthealgorithmproceeds,becausetheweightedtrainingerroroftheweakclassifierstendstogoup(T)InthecourseofboostingiterationstheweakclassifiersareforcedtotrytoclassifymoredifficultTheweightswillincreaseforthatarerepeatedlymisclassifiedbytheweakcomponentclassifiers.Theweightedtrainingerrorofthecomponentsthereforetendstogoupand,asaresult,theirvotesgodown.Thevotes ai
assignedtotheclassifiersassembledbyAdaBoostarealwaysnon-negative(T)Asdefinedinclass,AdaBoostwillchooseclassifierswithtrainingerrorabove1/2.Thiswillensurethat1-2
,andthereforethevote,ispositive.Notethatiftheclassifierdoesworsethan1/2wecanalways“flip”thesignofitspredictionsandthereforegetaclassifierthatdoesslightlybetterthan12.Thevoteassignedtothe“flipped”classifierwouldbenon-negative.2、Figure:Labeledexamples,weightsontheexamples,andthreepossiblestumps.InFigure red'o'pointscorrespondtonegativeexamples(yt=-1)andblue'+'pointsarepositiveexamples(yt=+1).ThefigurealsoshowsthenormalizedweightsontheexamplesresultingfromhavingruntheAdaBoostalgorithmforsomenumberofiterations.Therearealsoalsothreedecisionstumpsdrawninthefigure,h(x;θA),h(x;θB),andh(x;θC)orA,BandCforshort.Whichoneofthestumpswouldyouuseatthenextiteration(pleaseanswerA,B,orC)? BTheweightederrorofstumpBisthelowestamongthethreestumps.Whichoneofthestumpswasusedatthepreviousiterationtoobtaintheweightsontheexamplesshowninthe_gure(pleaseanswerA,B,orC)? Cthestumpthatwasselectedatthepreviousroundhastohaveweightederrorexactlyatthecurrentround.ThisistrueforstumpC.InFigure,circlethetrainingpoint(s)(possiblynone)thattheensembleh2(x)=h(x;_A)+h(x;_C)cannotclassifycorrectly.3ForthisproblemwearegivenatrainingsetDf(x1;y1);(xn;yn)gofexamplesandlabels.Wehavenootherdataavailable.WewilluseboostingasafeatureselectionmethodforanSVMclassifier.So,wefollowtheboostingalgorithmformroundsbasedonDtogetmdecisionstumpsh(x;θ1);:::;h(x;θm)(wewilldropthe“votes"generatedbytheboostingalgorithm).Afterthiswecancollectthebaseclassifierpredictionsintofeaturevectorsforeachtrainingexamplet=1,…,n.Notethat .TotrainSVMclassifiersbasedonthesefeaturevectorswewillsplitthedatasetDintotwoequalsetsDtrandDte,anduseDtrfortrainingandreserveDteforevaluatingtheperformanceoftheresultingclassifier.CouldweusethevalueofthemarginobtainedbythehardmarginSVMclassifiersonDtrasacriterionforselectingbetweenthetwokernelsbelow? (N)SupposewetrainaSVMclassifierwithkernelK1(x;x0)basedonDtrandevaluateitsperformanceonDte.DoestheperformanceonDteprovideafairmeasureofhowwelltheclassi_erisgoingtoworkonunseen(fromthesamedistribution)?No,classificationperformanceonDteisnotafairmeasuresincethefeatures,thestumps,thattheSVMclassifierreliesonwereestimatedonthebasisofD=fDte;Dtrg,.,includingDte.4.ConsiderbuildinganensembleofdecisionstumpswiththeAdaBoostalgorithm.Figure2thelabeledpointsintwodimensionsaswellasthefirststumpwehavechosen.Stumpspredictbinary±1valuesand,aslinearclassifiers,dependonlyononeofthecoordinatevalues.Thelittlearrowinthefigureisthe
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 信息保密合同模板
- 厂房拆迁合同模板企业
- 国企煤炭出售合同模板
- 企业门卫合同模板
- 加盟合同模板里
- 云南施工安全生产合同模板
- 合同算不算背书合同模板
- 供销苹果合同模板
- 医院治病合同模板
- 厨房装饰窗帘采购合同模板
- 2024年涉密人员考试试题库保密基本知识试题完整
- 医院科室停电应急预案演练脚本5篇
- 2024高温二次电池第1部分:要求
- MOOC 食品化学与营养-福州大学 中国大学慕课答案
- 新汉语水平考试HSK一级真题(含听力材料和答案)
- 中华民族共同体概论课件专家版5第五讲 大一统与中华民族共同体初步形成(秦汉时期)
- GB/T 19964-2024光伏发电站接入电力系统技术规定
- 心理委员培训-心理健康课件
- 系统工程清华大学
- 蒸汽行业前景分析
- 第一章:集合与常用逻辑用语章末综合检测卷(原卷版)
评论
0/150
提交评论