机器学习题库_第1页
机器学习题库_第2页
机器学习题库_第3页
机器学习题库_第4页
机器学习题库_第5页
已阅读5页,还剩39页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1MLestimationofexponentialmodel(10)AGaussiandistributionisoftenusedtomodeldataontherealline,butissometimesinappropriatewhenthedataareoftenclosetozerobutconstrainedtobenonnegative.Insuchcasesonecanfitanexponentialdistribution,whoseprobabilitydensityfunctionisgivenbybp1exbbGivenNobservationsxidrawnfromsuchadistribution:Writedownthelikelihoodasafunctionofthescaleparameterb.Writedownthederivativeoftheloglikelihood.GiveasimpleexpressionfortheMLestimateforb.2Poissonpe,y,2,...x!lN

gpx

xloglogx!i1

i ii1Ni1

xgNNiii

logx!i1p1-p11mm:p2Conjugatepriors

pknown,correctp

ppp1mGivenalikelihood

p

foraclassmodelswithparameters,aconjugatepriorisadistribution

p

withhyperparametersγ,suchthattheposteriordistributionp|X,pXpp与先验族相同Supposethatthelikelihoodisgivenbytheexponentialdistributionwithrateparameterλ:px|exShowthatthegammadistribution e_ 1isaconjugatepriorfortheexponential.Derivetheparameterupdategivenobservations

x,K,

andthepredictiondistribution

1 Npx ,K,x.N1 NShowthatthebetadistributionisaconjugatepriorforthegeometricdistributionpxk1k1whichdescribesthenumberoftimeacoinistosseduntilthefirstheadsappears,whentheprobabilityofheadsoneachtossisθ.Derivetheparameterupdateruleandpredictiondistribution.

p

isaconjugatepriorforthelikelihood

px|;showthatthemixturepriorp

,...,

wp|1 M m m1isalsoconjugateforthesamelikelihood,assumingthemixtureweightswmsumto1.Repeatpart(c)forthecasewherethepriorisasingledistributionandthelikelihoodisamixture,andthepriorisconjugateforeachmixturecomponentofthelikelihood.somepriorscanbeconjugateforseveraldifferentlikelihoods;forexample,thebetaisconjugatefortheBernoulliandthegeometricdistributionsandthegammaisconjugatefortheexponentialandforthegammawithfixedα(Extracredit,20)Explorethecasewherethelikelihoodisamixturewithfixedcomponentsandunknownweights;.,theweightsaretheparameterstobelearned.、判断题nn的增加而减小。(3)回归函数A和B,如果A比B更简单,则A几乎一定会比B在测试集上表现更好。(5)Boosting和Bagging都是组合多个分类器投票的方法,二者都是根据单个分类器的正确率决定其权重。(6)Intheboostingiterations,thetrainingerrorofeachnewdecisionstumpandthetrainingerrorofthecombinedclassifiervaryroughlyinconcert(F)Whilethetrainingerrorofthecombinedclassifierasafunctionofboostingiterations,theerroroftheindividualdecisionstumpssincetheweightsbecomeconcentratedatthemostdifficult(7)OneadvantageofBoostingisthatitdoesnotoverfit.(F)()Supportvectormachinesareresistanttooutliers,.,verynoisyeamplesdrawnfromadifferentdistribution.()在回归分集以数多计算大Lasso计算Lasso以现。训练数据更合。降有会陷于局部极小值,但 EM算法不会。。IntheAdaBoostalgorithm,theweightsonallthemisclassifiedpointswillgoupbythesamemultiplicativefactor.T)True/False:Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltycannotdecreasethe2errorofthesolutionw?onthetrainingdata. )True/False:Inaleast-squareslinearregressionproblem,addinganL2regularizationpenaltyalwaysdecreasestheexpected2errorofthesolutionw?onunseentestdata .除了EM算法梯下降也可求混高斯模型。 (T)Anydecisionboundarythatwegetfromagenerativemodelwithclass-conditionalGaussiandistributionscouldinprinciplebereproducedwithanSVMandapolynomialkernel.True!Infact,sinceclass-conditionalGaussiansalwaysyieldquadraticdecisionboundaries,theycanbereproducedwithanSVMwithkernelofdegreelessthanorequaltotwo.AdaBoostwilleventuallyreachtrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeencombined.False!Ifthedataisnotseparablebyalinearcombinationoftheweakclassifiers,AdaBoostcan’tachievezerotrainingerror.The2ynaenstoaerne.)Thelog-likelihoodofthedatawillalwaysincreasethroughsuccessiveiterationsoftheexpectationmaximationalgorithm.(F)Intrainingalogisticregressionmodelbythelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions. 归1Clomeanlog-probabilit10“随着C增加2log永远会增加”是否确并明理由。解释C较大2log降原因。 2线性模型:y~N

w

据如所示100 1用极大估计并a画模型3

C 用极大估计即log目标加入 w2 ,2 1并b画C很大模型3后高斯布方差2是变大变小还是变?4 (a) (b)3.xxx1 2

Txj

j1,2为 y~N 10xx7x25x

1-101 2 12 1 2xy最小最大最小1X2X8X10Xn1、2、810然后一大规独立集则下3列选择合适可能选最小最大最小1X2X8X10Xn1061、2、810然后一大规独1X28XX10X1X28XX10XTheapproximationerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.(T)Thestructuralerrorofapolynomialregressionmodeldependsonthenumberoftrainingpoints.(F)4Wearetryingtolearnregressionparametersforadatasetwhichweknowwasgeneratedfromapolynomialofacertaindegree,butwedonotknowwhatthisdegreeis.Assumethedatawasactuallygeneratedfromapolynomialofdegree5withsomeaddedGaussiannoise(thatisywwwwww, ~N0 1 2 3 4 5{ x}{x}Aa4Batestdatabetter?Answer:Degree6polynomial.Sincethemodelisadegree5polynomialandwehaveenoughtrainingdata,themodelwelearnforasixdegreepolynomialwilllikelyfitaverysmallcoefficientfor.Thus,eventhoughitisasixdegreepolynomialitwillactuallybehaveinaverysimilarwaytoafifthdegreepolynomialwhichisthecorrectmodelleadingtobetterfittothedata.5Input-dependentnoiseinregressionOrdinaryleast-squaresregressionisequivalenttoassumingthateachdatapointisgeneratedaccordingtoalinearfunctionoftheinputpluszero-mean,constant-varianceGaussiannoise.Inmanysystems,however,thenoisevarianceisitselfapositivelinearfunctionoftheinput(whichisassumedtobenon-negative,.,x>=0).Whichofthefollowingfamiliesofprobabilitymodelscorrectlydescribesthissituationintheunivariatecase?(Hint:onlyoneofthemdoes.)(iii)iscorrect.InaGaussiandistributionovery,thevarianceisdeterminedbythecoefficientof;sobyreplacing

2

x2,wegetavariancethatincreaseslinearlywithx.(Notealsothechangetothenormalization“constant.”)(i)hasquadraticdependenceonx;(ii)doesnotchangethevarianceatall,itjustrenamesw1.CircletheplotsinFigure1thatcouldplausiblyhavebeengeneratedbysomeinstanceofthemodelfamily(ies)youchose.(ii)and(iii).(Notethat(iii)worksvarianceappearsindependentofx.

20.)(i)exhibitsalargevarianceatx=0,andtheTrue/False:Regressionwithinput-dependentnoisegivesthesamesolutionasordinaryregressionforaninfinitedatasetgeneratedaccordingtothecorrespondingmodel.True.Inbothcasesthealgorithmwillrecoverthetrueunderlyingmodel.Forthemodelyouchoseinpart(a),writedownthederivativeofthenegativeloglikelihoodwithrespectto、分类1. 产生式模型 vs.判别式模型Yourbillionairefriendneedsyourhelp.Sheneedstoclassifyjobapplicationsintogood/badcategories,andalsotodetectjobapplicantswholieintheirapplicationsusingdensityestimationtodetectoutliers.Tomeettheseneeds,doyourecommendusingadiscriminativeorgenerativeclassifier?Why?产生式模型p|yYourbillionairefriendalsowantstoclassifysoftwareapplicationstodetectbug-proneapplicationsusingfeaturesofthesourcecode.Thispilotprojectonlyhasafewapplicationstobeusedastrainingdata,though.Tocreatethemostaccurateclassifier,doyourecommendusingadiscriminativeorgenerativeclassifier?Why?判别式模型样本数较少,通常用判别式模型直接分类效果会好些(d)Finally,yourbillionairefriendalsowantstoclassifycompaniestodecidewhichonetoacquire.Thisprojecthaslotsoftrainingdatabasedonseveraldecadesofresearch.Tocreatethemostaccurateclassifier,doyourecommendusingadiscriminativeorgenerativeclassifier?Why?产生式模型样本数很多时,可以学习到正确的产生式模型2、logstic回归Figure2:Log-probabilityoflabelsasafunctionofregularizationparameterCHereweusealogisticregressionmodeltosolveaclassificationproblem.InFigure2,wehaveplottedthemeanlog-probabilityoflabelsinthetrainingandtestsetsafterhavingtrainedtheclassifierwithquadraticregularizationpenaltyanddifferentvaluesoftheregularizationparameterC.1Intrainingalogisticregressionmodelbymaximizingthelikelihoodofthelabelsgiventheinputswehavemultiplelocallyoptimalsolutions.(F)Answer:Thelog-probabilityoflabelsgivenexamplesimpliedbythelogisticregressionmodelisaconcave(convexdown)functionwithrespecttotheweights.The(only)locallyoptimalsolutionisalsogloballyoptimal2、 Astochasticgradientalgorithmfortraininglogisticregressionmodelswithafixedlearningratewillfindtheoptimalsettingoftheweightsexactly.(F)Answer:Afixedlearningratemeansthatwearealwaystakingafinitesteptowardsimprovingthelog-probabilityofanysingletrainingexampleintheupdateequation.Unlesstheexamplesaresomehow“aligned”,wewillcontinuejumpingfromsidetosideoftheoptimalsolution,andwillnotbeabletogetarbitrarilyclosetoit.Thelearningratehastoapproachtointhecourseoftheupdatesfortheweightstoconverge.3、Theaveragelog-probabilityoftraininglabelsasinFigure2canneverincreaseasweincreaseC.(T)Strongerregularizationmeansmoreconstraintsonthesolutionandthusthe(average)log-probabilityofthetrainingexamplescanonlygetworse.4、ExplainwhyinFigure2thetestlog-probabilityoflabelsdecreasesforlargevaluesofC.AsCincreases,wegivemoreweighttoconstrainingthepredictor,andthusgivelessflexibilitytofittingthetrainingset.Theincreasedregularizationguaranteesthatthetestperformancegetsclosertothetrainingperformance,butasweover-constrainourallowedpredictors,wearenotabletofitthetrainingsetatall,andalthoughthetestperformanceisnowveryclosetothetrainingperformance,botharelow.5Thelog-probabilityoflabelsinthetestsetwoulddecreaseforlargevaluesofCevenifwehadalargenumberoftrainingexamples.(T)Theaboveargumentstillholds,butthevalueofCforwhichwewillobservesuchadecreasewillscaleupwiththenumberofexamples.6Addingaquadraticregularizationpenaltyfortheparameterswhenestimatingalogisticregressionmodelensuresthatsomeoftheparameters(weightsassociatedwiththecomponentsoftheinputvectors)vanish.Aregularizationpenaltyforfeatureselectionmusthavenon-zeroderivativeatOtherwise,theregularizationhasnoeffectatandweightwilltendtobeslightlynon-zero,evenwhenthisdoesnotimprovethelog-probabilitiesbymuch.3、正则化的Logstic回归ThisproblemwewillrefertothebinaryclassificationtaskdepictedinFigure1(a),whichweattempttosolvewiththesimplelinearlogisticregressionmodel(forsimplicitywedonotusethebiasparameterw0).Thetrainingdatacanbeseparatedwithtrainingerror-seelineL1inFigure1(b)forinstance.(a)The(a)The2-dimensionaldatasetusedinProblem2(b)ThepointscanbeseparatedbyL1(solidline).PossibleotherdecisionboundariesareshownbyL2;L3;L4.ConsideraregularizationapproachwherewetrytoforlargeC.Notethatonlyw2ispenalized.liketoknowwhichofthefourlinesinFigure1(b)couldariseasaresultofsuchregularization.ForeachpotentiallineL2,L3orL4determinewhetheritcanresultfromregularizingw2.Ifnot,explainverybrieflywhynot.L2:No.Whenweregularizew2,theresultingboundarycanrelylessonthevalueofx2andthereforebecomesmorevertical.L2hereseemstobemorehorizontalthantheunregularizedsolutionsoitcannotcomeasaresultofpenalizingw2L3:Yes.Herew2^2issmallrelativetow1^2(asevidencedbyhighslope),andeventhoughitwouldassignaratherlowlog-probabilitytotheobservedlabels,itcouldbeforcedbyalargeregularizationparameterC.L4:No.ForverylargeC,wegetaboundarythatisentirelyvertical(linex1=0orthex2axis).hereisreflectedacrossthex2axisandrepresentsapoorersolutionthancounterpartontheotherside.Formoderateregularizationwehavetogetthebestsolutionthatwecanconstructwhilekeepingw2small.L4isnotthebestandthuscannotcomeasaresultofregularizingw2.Ifwechangetheformofregularizationtoone-norm(absolutevalue)andalsoregularizew1wegetthefollowingpenalizedlog-likelihoodConsideragaintheprobleminFigure1(a)andthesamelinearlogisticregressionmodel.AsweincreasetheregularizationparameterCwhichofthefollowingscenariosdoyouexpecttoobserve(chooseonlyone):(x)Firstw1willbecome0,thenw2.()w1andw2willbecomesimultaneously()Firstw2willbecome0,thenw1.()Noneoftheweightswillbecomeexactlyzero,onlysmallerasCincreasesThedatacanbeclassifiedwithtrainingerrorandthereforealsowithhighlog-probabilitybylookingatthevalueofx2alone,.makingw1=0.Initiallywemightprefertohaveanon-zerovalueforw1butitwillgotoratherquicklyasweincreaseregularization.Notethatwepayaregularizationpenaltyforanon-zerovalueofw1andifithelpclassificationwhywouldwepaythepenalty?Theabsolutevalueregularizationensuresthatw1willindeedgotoexactlyAsCincreasesfurther,evenw2willeventuallybecomeWepayhigherandhighercostforsettingw2toanon-zerovalue.Eventuallythiscostoverwhelmsthegainfromthelog-probabilityoflabelsthatwecanachievewithanon-zerow2.Notethatwhenw1=w2=0,thelog-probabilityoflabelsisafinitevaluenlog(0:5).1、 SVMFigure4:Trainingset,maximummarginlinearseparator,andthesupportvectors(inbold).Whatistheleave-one-outcross-validationerrorestimateformaximummarginseparationinfigure4?(weareaskingforanumber)(0)Basedonthefigurewecanseethatremovinganysinglepointwouldnotchancetheresultingmaximummarginseparator.Sinceallthepointsareinitiallyclassifiedcorrectly,theleave-one-outerrorisWewouldexpectthesupportvectorstoremainthesameingeneralaswemovefromalinearkerneltohigherorderpolynomialkernels.(F)Therearenoguaranteesthatthesupportvectorsremainthesame.Thefeaturevectorscorrespondingtopolynomialkernelsarenon-linearfunctionsoftheoriginalinputvectorsandthusthesupportpointsformaximummarginseparationinthefeaturespacecanbequitedifferent.Structuralriskminimizationisguaranteedtofindthemodel(amongthoseconsidered)withthelowestexpectedloss.F)Weareguaranteedtofindonlythemodelwiththelowestupperboundontheexpectedloss.WhatistheVC-dimensionofamixtureoftwoGaussiansmodelintheplanewithequalcovariancematrices?Why?AmixtureoftwoGaussianswithequalcovariancematriceshasalineardecisionboundary.LinearseparatorsintheplanehaveVC-dimexactly3.4、SVM对如下数据点进行分类:Plotthesesixtrainingpoints.Aretheclasses{+,?}linearlyseparable?yesConstructtheweightvectorofthemaximummarginhyperplanebyinspectionandidentifythesupportvectors.Themaximummarginhyperplaneshouldhaveaslopeof?1andshouldsatisfyx1=3/2,x2=0.Thereforeit’sequationisx1+x2=3/2,andtheweightvectoris(1,1)T.Ifyouremoveoneofthesupportvectorsdoesthesizeoftheoptimalmargindecrease,staythesame,orincrease?Inthisspecificdatasettheoptimalmarginincreaseswhenweremovethesupportvectors(1,0)or(1,1)andstaysthesamewhenweremovetheothertwo.(ExtraCredit)Isyouranswerto(c)alsotrueforanydataset?Provideacounterexampleorgiveashortproof.Whenwedropsomeconstraintsinaconstrainedmaximizationproblem,wegetanoptimalvaluewhichisatleastasgoodthepreviousone.Itisbecausethesetofcandidatessatisfyingtheoriginal(larger,stronger)setofcontraintsisasubsetofthecandidatessatisfyingthenew(smaller,weaker)setofconstraints.So,fortheweakerconstraints,theoldoptimalsolutionisstillavailableandtheremaybeadditionssoltonsthatareevenbetter.Inmathematicalform:Finally,notethatinSVMproblemswearemaximizingthemarginsubjecttotheconstraintsgivenbytrainingpoints.Whenwedropanyoftheconstraintsthemargincanincreaseorstaythesamedependingonthedataset.Ingeneralproblemswithrealisticdatasetsitisexpectedthatthemarginincreaseswhenwedropsupportvectors.Thedatainthisproblemisconstructedtodemonstratethatwhenremovingsomeconstraintsthemargincanstaythesameorincreasedependingonthegeometry.2、SVM对下述有3个数据点的集合进行分类:Aretheclasses{+,?}linearlyseparable?No。Considermappingeachpointto3-Dusingnewfeaturevectors

x2x,x2classesnowlinearlyseparable?Ifso,findaseparatinghyperplane.,,

2,1

respectively.Thepointsarenowseparablein3-dimensionalspace.Aseparatinghyperplaneisgivenbytheweightvector(0,0,1)inthenewspaceasseeninthefigure.Defineaclassvariableyi2{?1,+1}whichdenotestheclassofxiandletw=(w1,w2,w3)T.Themax-marginSVMclassifiersolvesthefollowingproblemUsingthemethodofLagrangemultipliersshowthatthesolutionis

ˆ0,0,2

,b

and1ˆthemarginˆ2Foroptimizationproblemswithinequalityconstraintssuchastheabove,weshouldapplyKKTconditionswhichisageneralizationofLagrangemultipliers.Howeverthisproblemcanbesolvedeasierbynotingthatwehavethreevectorsinthe3-dimensionalspaceandallofthemaresupportvectors.Hencetheall3constraintsholdwithequality.ThereforewecanapplythemethodofLagrangemultipliersto,Showthatthesolutionremainsthesameiftheconstraintsarechangedtoforany

1.Credit)Isanswerto(d)alsotrueforanydatasetand1?Provideaorgiveashortproof.SVMSupposeweonlyhavefourtrainingexamplesintwodimensions(seefigureabove):positiveat=[0,0],=[2,2]andnegativeat=[h,1],=[0,3],wherewetreat0≤h≤3asaparameterHowlargecanh≥0besothatthetrainingpointsarestilllinearlyseparable?Uptoh=1Doestheorientationofthemargindecisionboundarychangeasafunctionofhwhenthepointsareseparable(Y/N)?No,becausex1,x2,x3remainthesupportvectors.Whatisthemarginachievedbythemarginboundaryasafunctionofh?[Hint:Itturnsoutthatthemarginasafunctionofhisalinearfunction.]Assumethatwecanonlyobservethesecondcomponentoftheinputvectors.Withouttheothercomponent,thelabeledtrainingpointsreduceto(0,y=1),(2,y=1),(1,y=-1),and(3,y=-1).Whatisthelowestorderpofkernelthatwouldallowustocorrectlyclassifythesepoints?Theclassesofthepointsonthelineobservetheorder1,-1,1,-1.Therefore,weneedacubic3、LDAUsingasetof100labeledtrainingexamples(twoclasses),wetrainthefollowingmodels:GaussI:AGaussianmodel(oneGaussianperclass),wherethecovariancematricesarebothsettoIGaussX:AGaussianmodel(oneGaussianperclass)withoutanyrestrictionsonthecovariancematrices.LinLog:Alogisticregressionmodelwithlinearfeatures.QuadLog:Alogisticregressionmodel,usingalllinearandquadraticfeatures.Aftertraining,wemeasureforeachmodeltheaveragelogprobabilityoflabelsgiveninthetrainingset.Specifyalltheequalitiesorinequalitiesthatmustholdbetweenthemodelsrelativetothisperformancemeasure.Wearelookingforstatementslike“model1<=model2”or“model1=model2”.Ifnosuchstatementholds,write“none”.GaussI<=LinLog(bothhavelogisticpostiriors,andLinLogisthelogisticmodelmaximizingtheaveragelogprobabilities)GaussX<=QuadLog(bothhavelogisticpostiriorswithquadraticfeatures,andQuadLogisthemodelofthisclassmaximizingtheaveragelogprobabilities)LinLog<=QuadLog(logisticregressionmodelswithlinearfeaturesareasubclassoflogisticregressionmodelswithquadraticfunctions—thefromthesuperclassisatleastashighasthefromthesubclass)GaussI<=QuadLog(followsfromaboveinequalities)(GaussXwillhavehigheraveragelogjointprobabilitiesofandlabels,thenwillGaussI.Buthavehigheraveragelogjointprobabilitiesdoesnotnecessarilytranslatetohigheraveragelogconditionalprobabilities)Whichequalitiesandinequalitiesmustalwaysholdifweinsteadusethemeanclassificationerrorinthetrainingsetastheperformancemeasure?Againusetheformat“model1<=model2”or“model1=model2”.Write“none”ifnosuchstatementholds.None.Havinghigheraveragelogconditionalprobabilities,oraveragelogjointprobabilities,doesnotnecessarilytranslatetohigherorlowerclassificationerror.canbeconstructedforallpairsinbothdirections.Althoughthereisnoinequalitieswhichiscorrect,itiscommonlythecasethatGaussX<=GaussIandthatQuadLog<=LinLog.Partialcreditofuptotwopointswasawardedfortheseinequalities.5、WeconsiderheregenerativeanddiscriminativeapproachesforsolvingtheclassificationproblemillustratedinFigureSpecifically,wewilluseamixtureofGaussiansmodelandregularizedlogisticregressionmodels.Figure.Labeledtrainingset,where“+”correspondstoclassy=1.WewillfirstestimateamixtureofGaussiansmodel,oneGaussianperclass,withtheconstraintthatthecovariancematricesareidentitymatrices.Themixingproportions(classfrequencies)andthemeansofthetwoGaussiansarefreeparameters.PlotthemaximumlikelihoodestimatesofthemeansofthetwoclassconditionalGaussiansinFigure.Markthemeansaspoints“x”andlabelthem“0”and“1”accordingtotheclass.Themeansshouldbeclosetothecenterofmassofthepoints.Drawthedecisionboundaryinthesamefigure.Sincethetwoclasseshavethesamenumberofpointsandthesamecovariancematrices,thedecisionboundaryisalineand,moreover,shouldbedrawnastheorthogonalbisectorofthelinesegmentconnectingtheclassmeans.Wehavealsotrainedregularizedlinearlogisticregressionmodelsforthesamedata.Theregularizationpenalties,usedinpenalizedconditionalloglikelihoodestimation,were

-Cw2,wherei=0,1,2.Inotherwords,onlyoneoftheparameterswereregularizedineachicase.BasedonthedatainFigure,wegeneratedthreeplots,oneforeachregularizedparameter,ofthenumberofmisclassifiedtrainingpointsasafunctionofC(Figure.Thethreeplotsarenotidentifiedwiththecorrespondingparameters,however.Pleaseassignthe“top”,“middle”,and“bottom”plotstothecorrectparameter,w0,w1,orw2,theparameterthatwasregularizedintheplot.Provideabriefjustificationforeachassignment.?“top”=(w1)Bystronglyregularizingw1weforcetheboundarytobehorizontalinthefigure.Thelogisticregressionmodeltriestomaximizethelog-probabilityofclassifyingthedatacorrectly.Thehighestpenaltycomesfromthemisclassifiedpointsandthustheboundarywilltendtobalancethe(worst)errors.Inthefigure,thisisroughlyspeakingx2=1line,resultingin4errors.? “middle”=Ifweregularizew0,thentheboundarywilleventuallygothroughtheorigin(biastermsettoBasedonthefigurewecanfindagoodlinearboundarythroughtheoriginwithonlyoneerror.?“bottom”=(w2)Thetrainingerrorisunaffectedifweregularizew2(constraintheboundarytobevertical);thevalueofw2wouldbesmallalreadywithoutregularization.4、midterm2009problem46、Considertwoclassifiers:1)anSVMwithaquadratic(secondorderpolynomial)kernelfunctionand2)anunconstrainedmixtureoftwoGaussiansmodel,oneGaussianperclasslabel.TheseclassifierstrytomapexamplesinR2tobinarylabels.Weassumethattheproblemisseparable,noslackpenaltiesareaddedtotheSVMclassifier,andthatwehavesufficientlymanytrainingexamplestoestimatethecovariancematricesofthetwoGaussiancomponents.ThetwoclassifiershavethesameVC-dimension.(T)Supposeweevaluatedthestructuralriskminimizationscoreforthetwoclassifiers.Thescoreistheboundontheexpectedlossoftheclassifier,whentheclassifierisestimatedonthebasisofntrainingexamples.Whichofthetwoclassifiersmightyieldthebetter(lower)score?Provideabriefjustification.TheSVMwouldprobablygetabetterscore.BothclassifiershavethesamecomplexitypenaltybutSVMwouldbetteroptimizethetrainingerrorresultinginalower(orequal)overallscore.[final2004]2,WeestimatedamixtureoftwoGaussiansmodelbasedontwodimensionaldatashowninfigurebelow.Themixturewasinitializedrandomlyintwodifferentwaysandrunforthreeiterationsbasedoneachinitialization.However,thefiguresgotmixedup(yes,again!).Pleasedrawanarrowfromonefiguretoanothertoindicatehowtheyfollowfromeachother(youshoulddrawonlyfourarrows).Figure:mixturemodelwithEM,twoinitializations,threeiterationsforeachWealsowantedtotryanothertwomodelsbasedonthesamenobservationsasin:Youcanassumethattheparametersareunconstrainedtotheextentpossible.S).Howmuchhigherlog-likelihoodwouldModel2havetoassigntothetrainingdataforustoselectthismodelwiththeBayesianInformationCriterion(BIC)?Model2hasonemoreparameterthanModel1.Thusthelog-likelihoodofModel2wouldhavetoovercometheadditionalcomplexitypenalty1/2log(n)ithasintheBICcriterion.Boosting1、Figure2:h1ischosenatthefirstiterationofboosting;whatistheweighta assignedtoit?1Figure2showsadatasetof8points,equallydividedamongthetwoclasses(positiveandnegative).Thefigurealsoshowsaparticularchoiceofdecisionstumph1pickedbyAdaBoostinthefirstiteration.Whatistheweight a1

thatwillbeassignedtoh1byAdaBoost?(Initialweightsofallthedatapointsareequal,or1/8.)Theweightedtrainingerrore is1/8–

a =1log1- e1 2 2 e

12

=18218

log727AdaBoostwilleventuallyreachtrainingerror,regardlessofthetypeofweakclassifierituses,providedenoughweakclassifiershavebeencombined.F)Notifthedatainthetrainingsetcannotbeseparatedbyalinearcombinationofthespecificofweakclassifiersweareusing.Thevotesai

assignedtotheweakclassifiersinboostinggenerallygodownasthealgorithmproceeds,becausetheweightedtrainingerroroftheweakclassifierstendstogoup(T)InthecourseofboostingiterationstheweakclassifiersareforcedtotrytoclassifymoredifficultTheweightswillincreaseforthatarerepeatedlymisclassifiedbytheweakcomponentclassifiers.Theweightedtrainingerrorofthecomponentsthereforetendstogoupand,asaresult,theirvotesgodown.Thevotes ai

assignedtotheclassifiersassembledbyAdaBoostarealwaysnon-negative(T)Asdefinedinclass,AdaBoostwillchooseclassifierswithtrainingerrorabove1/2.Thiswillensurethat1-2

,andthereforethevote,ispositive.Notethatiftheclassifierdoesworsethan1/2wecanalways“flip”thesignofitspredictionsandthereforegetaclassifierthatdoesslightlybetterthan12.Thevoteassignedtothe“flipped”classifierwouldbenon-negative.2、Figure:Labeledexamples,weightsontheexamples,andthreepossiblestumps.InFigure red'o'pointscorrespondtonegativeexamples(yt=-1)andblue'+'pointsarepositiveexamples(yt=+1).ThefigurealsoshowsthenormalizedweightsontheexamplesresultingfromhavingruntheAdaBoostalgorithmforsomenumberofiterations.Therearealsoalsothreedecisionstumpsdrawninthefigure,h(x;θA),h(x;θB),andh(x;θC)orA,BandCforshort.Whichoneofthestumpswouldyouuseatthenextiteration(pleaseanswerA,B,orC)? BTheweightederrorofstumpBisthelowestamongthethreestumps.Whichoneofthestumpswasusedatthepreviousiterationtoobtaintheweightsontheexamplesshowninthe_gure(pleaseanswerA,B,orC)? Cthestumpthatwasselectedatthepreviousroundhastohaveweightederrorexactlyatthecurrentround.ThisistrueforstumpC.InFigure,circlethetrainingpoint(s)(possiblynone)thattheensembleh2(x)=h(x;_A)+h(x;_C)cannotclassifycorrectly.3ForthisproblemwearegivenatrainingsetDf(x1;y1);(xn;yn)gofexamplesandlabels.Wehavenootherdataavailable.WewilluseboostingasafeatureselectionmethodforanSVMclassifier.So,wefollowtheboostingalgorithmformroundsbasedonDtogetmdecisionstumpsh(x;θ1);:::;h(x;θm)(wewilldropthe“votes"generatedbytheboostingalgorithm).Afterthiswecancollectthebaseclassifierpredictionsintofeaturevectorsforeachtrainingexamplet=1,…,n.Notethat .TotrainSVMclassifiersbasedonthesefeaturevectorswewillsplitthedatasetDintotwoequalsetsDtrandDte,anduseDtrfortrainingandreserveDteforevaluatingtheperformanceoftheresultingclassifier.CouldweusethevalueofthemarginobtainedbythehardmarginSVMclassifiersonDtrasacriterionforselectingbetweenthetwokernelsbelow? (N)SupposewetrainaSVMclassifierwithkernelK1(x;x0)basedonDtrandevaluateitsperformanceonDte.DoestheperformanceonDteprovideafairmeasureofhowwelltheclassi_erisgoingtoworkonunseen(fromthesamedistribution)?No,classificationperformanceonDteisnotafairmeasuresincethefeatures,thestumps,thattheSVMclassifierreliesonwereestimatedonthebasisofD=fDte;Dtrg,.,includingDte.4.ConsiderbuildinganensembleofdecisionstumpswiththeAdaBoostalgorithm.Figure2thelabeledpointsintwodimensionsaswellasthefirststumpwehavechosen.Stumpspredictbinary±1valuesand,aslinearclassifiers,dependonlyononeofthecoordinatevalues.Thelittlearrowinthefigureisthe

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论