




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
UnravelingMeta-Learning:UnderstandingFeature
RepresentationsforFew-ShotTasks
HarichandanaVejendla
(50478049)
1
2
Definitions
•Meta-Learning:Meta-learningdescribesmachinelearningalgorithmsthatacquireknowledgeandunderstandingfromtheoutcomeofothermachinelearningalgorithms.Theylearnhowtobest
combinethepredictionsfromothermachine-learningalgorithms.
•Few-shotLearning:Few-ShotLearningisaMachineLearningframeworkthatenablesapre-trainedmodeltogeneralizeovernewcategoriesofdatausingonlyafewlabeledsamplesperclass.
•FeatureExtraction:Featureextractionisaprocessofdimensionalityreductionthatinvolvestransformingrawdataintonumericalfeaturesthatcanbeprocessed.
•Featureclustering:Featureclusteringaggregatespointfeaturesintogroupswhosemembersaresimilartoeachotherandnotsimilartomembersofothergroups.
•FeatureRepresentation:RepresentationLearningorfeaturelearningisthesubdisciplineofthe
machinelearningspacethatdealswithextractingfeaturesorunderstandingtherepresentationofadataset.
3
Introduction
•TransferLearning:Pre-trainingamodelonlargeauxiliarydatasetsandthenfine-tuningtheresultingmodelsonthetargettask.Thisisusedforfew-shotlearningsinceonlyafewdatasamplesareavailableinthetarget
domain.
•Transferlearningfromclassicallytrainedmodelsyieldspoorperformanceforfew-shotlearning.Recently,few-shotlearninghasbeenrapidlyimprovedusingmeta-learningmethods.
•Thissuggeststhatthefeaturerepresentationslearnedbymeta-learningmustbefundamentallydifferentfromfeaturerepresentationslearnedthroughconventionaltraining.
•Thispaperunderstandsthedifferencesbetweenfeatureslearnedbymeta-learningandclassicaltraining.
•Basedonthis,thepaperproposessimpleregularizersthatboostfew-shotperformanceappreciably.
4
Meta-LearningFramework
•Inthecontextoffew-shotlearning,theobjectiveofmeta-learningalgorithmsistoproduceanetworkthatquicklyadaptstonewclassesusinglittledata.
•Meta-learningalgorithmsfindparametersthatcanbefine-tunedinafewoptimizationstepsandonafewdatapointsinordertoachievegoodgeneralization.
•Thetaskischaracterizedasn-way,k-shotifthemeta-learningalgorithmmustadapttoclassifydatafromTiafterseeingkexamplesfromeachofthenclassesinTi.
Algorithm
5
6
AlgorithmDescription
•Meta-learningschemestypicallyrelyonbi-leveloptimizationproblemswithaninnerloopandanouterloop.
•Aniterationoftheouterloopinvolvesfirstsamplinga“task,”whichcomprisestwosetsoflabeleddata:thesupportdata,Tis,andthequerydata,Tiq.
•Intheinnerloop,themodelbeingtrainedisfine-tunedusingthesupportdata.
•Fine-tuningproducesnewparametersθi,thatareafunctionoftheoriginalparametersandsupportdata.
•Weevaluatethelossonthequerydataandcomputethegradientsw.r.ttheoriginalparametersθ.Weneedtounrollthefine-tuningstepsandbackpropagatethroughthemtocomputethegradients.
•Finally,theroutinemovesbacktotheouterloop,wherethemeta-learningalgorithmminimizeslossonthequerydatawithrespecttothepre-fine-tunedweights.Basemodelparametersareupdatedusingthe
gradients.
7
Meta-LearningAlgorithms
Avarietyofmeta-learningalgorithmsexist,mostlydifferinginhowtheyarefine-tunedusingthesupportdataduringtheinnerloop:
•MAML:Updatesallnetworkparametersusinggradientdescentduringfine-tuning.
•R2-D2andMetaOptNet:Last-layermeta-learningmethods(onlytrainthelastlayer).Theyfreezethefeatureextractionlayers(featureextractor’sparametersarefrozen)duringtheinnerloop.Onlythelinearclassifierlayeristrainedduringfine-tuning.
•ProtoNet:Last-layermeta-learningmethod.Itclassifiesexamplesbytheproximityoftheirfeaturestothoseofclasscentroids.Theextractedfeaturesareusedtocreateclasscentroidswhichthen
determinethenetwork’sclassboundaries.
8
Few-ShotDatasets
•Mini-ImageNet:ItisaprunedanddownsizedversionoftheImageNetclassificationdataset,
consistingof60,000,84×84RGBcolorimagesfrom100.These100classesaresplitinto64,16,and20classesfortraining,validation,andtestingsets,respectively.
•CIFAR-FSdataset:samplesimagesfromCIFAR-100.CIFAR-FSissplitinthesamewayasmini-ImageNetwith60,00032×32RGBcolorimagesfrom100classesdividedinto64,16,and20
classesfortraining,validation,andtestingsets,respectively.
ComparisonbetweenMeta-LearningandClassicalTrainingModels
•DatasetUsed:1-shotmini-ImageNet
•Classicallytrainedmodelsaretrainedusingcross-entropylossandSGD.
•Commonfine-tuningproceduresareusedforbothmeta-learnedandclassically-trainedmodelsforafaircomparison
•Resultsshowthatmeta-learningmodelsperformbetterthanclassicaltrainingmodelsonfew-shotclassification.
•Thisperformanceadvantageacrosstheboardsuggeststhatmeta-learnedfeaturesarequalitativelydifferentfromconventionalfeaturesandfundamentallysuperiorforfew-shotlearning.
9
10
ClassClusteringinFeatureSpace
MeasuringClusteringinFeatureSpace:
Tomeasurefeatureclustering(FC),weconsidertheintra-classtointer-classvarianceratio:
φi,j-featurevectorcorrespondingtodatapointinclassiintrainingdata
μi-meanoffeaturevectorsinclassi
μ-meanacrossallfeaturevectors
C-numberofclasses
N-numberofdatapointsperclass
Where,fθ(xi,j)=φi,jfθ-featureextractor
xi,j-trainingdatainclassi
Lowvaluesofthisfractioncorrespondtocollectionsoffeaturessuchthatclassesarewell-separatedandahyperplaneformedbychoosingapointfromeachoftwoclassesdoesnotvarydramaticallywiththechoiceofsamples.
WhyClusteringisimportant?
•Asfeaturesinaclassbecomespreadoutandtheclassesarebroughtclosertogether,theclassificationboundariesformedbysamplingone-shotdataoftenmisclassifylargeregions.
•Asfeaturesinaclassarecompactedandclassesmovefarapartfromeachother,theintra-classtointer-classvarianceratiodrops,andthedependenceoftheclassboundaryonthechoiceofone-shotsamplesbecomesweaker.
11
ComparingFeatureRepresentationsofMeta-LearningandClassicallyTrainedModels
•Threeclassesarerandomlychosenfromthetestset,and100samplesaretakenfromeachclass.Thesamplesarethenpassedthroughthefeatureextractor,andtheresultingvectorsareplotted.
•Becausefeaturespaceishigh-dimensional,weperformalinearprojectionontothefirsttwocomponentvectorsdeterminedbyLDA.
•Lineardiscriminantanalysis(LDA)projectsdataontodirectionsthatminimizetheintra-classtointer-classvarianceratio.
•Theclassicallytrainedmodelmashesfeaturestogether,whilethemeta-learnedmodelsdrawtheclassesfartherapart.
12
13
HyperplaneInvariance
Thisregularizerwithonethatpenalizesvariationsinthemaximum-marginhyperplaneseparatingfeaturevectorsin
oppositeclasses
HyperplaneVariationRegularizer:
DatpointsinclassA:x1,x2
DatapointsinclassB:y1,y2
fθ-featureextractor
fθ(x1)-fθ(y1):determinesthedirectionofthemaximum
marginhyperplaneseparatingthetwopointsinthefeaturespace
•Thisfunctionmeasuresthedistancebetweendistancevectorsx1−y1andx2−y2relativetotheirsize.
•Inpractice,duringabatchoftraining,wesamplemanypairsofclassesandtwosamplesfromeachclass.Then,wecomputeRHVonallclasspairsandaddthesetermstothecross-entropyloss.
•WefindthatthisregularizerperformsalmostaswellasFeatureClusteringRegularizerandconclusivelyoutperformsnon-regularizedclassicaltraining.
14
Experiments
•FeatureclusteringandHyperplanevariationvaluesarecomputed.
•Thesetwoquantitiesmeasuretheintra-classtointer-classvarianceratioandinvarianceofseparatinghyperplanes.
•Lowervaluesofeachmeasurementcorrespondtobetterclassseparation.
•OnbothCIFAR-FSandmini-ImageNet,themeta-learnedmodelsattainlowervalues,indicatingthatfeaturespaceclusteringplaysaroleintheeffectivenessofmeta-learning.
15
Experiments
•Weincorporatetheseregularizersintoastandardtrainingroutineoftheclassicaltrainingmodel.
•Inallexperiments,featureclusteringimprovestheperformanceoftransferlearningandsometimesevenachieveshigherperformancethanmeta-learning
16
WeightClustering:FindingClustersofLocalMinimaforTaskLossesinParameterSpace
•SinceReptiledoesnotfixthefeatureextractorduringfine-tuning,itmustfindparametersthatadapteasilytonewtasks.
•WehypothesizethatReptilefindsparametersthatlieveryclosetogoodminimaformanytasksandis,therefore,abletoperformwellonthesetasksafterverylittlefine-tuning.
•ThishypothesisisfurthermotivatedbythecloserelationshipbetweenReptileandconsensusoptimization.
•Inaconsensusmethod,anumberofmodelsareindependentlyoptimizedwiththeirowntask-specificparameters,andthetaskscommunicateviaapenaltythatencouragesalltheindividualsolutionsto
convergearoundacommonvalue.
17
ConsensusFormulation:
•Reptilecanbeinterpretedasapproximatelyminimizingtheconsensusformulation
•Reptiledivergesfromatraditionalconsensusoptimizeronlyinthatitdoesnotexplicitlyconsiderthequadraticpenaltytermwhenminimizingfor˜θp.
18
ConsensusOptimizationImprovesReptile
•WemodifyReptiletoexplicitlyenforceparameterclusteringaroundaconsensusvalue.
•Wefindthatdirectlyoptimizingtheconsensusformulationleadstoimprovedperformance.
•duringeachinnerloopupdatestepinReptile,wepenalizethesquareddistancefromtheparametersforthecurrenttasktotheaverageoftheparametersacrossalltasksinthecurrentbatch.
•ThisisequivalenttotheoriginalReptilewhenα=0.Wecallthismethod“Weight-Clustering.
ReptilewithWeightClusteringRegularizer
n-numberofmeta-trainingsteps
k-numberofiterationsorstepstoperformwithineachmeta-trainingstep
19
20
Resultsofweightclustering
•WecomparetheperformanceofourregularizedReptilealgorithmtothatoftheoriginalReptilemethodaswellasfirst-orderMAML(FOMAML)andaclassicallytrainedmodelofthesamearchitecture.We
testthesemethodsonasampleof100,0005-way1-shotand5-shotmini-ImageNettasks
•ReptilewithWeight-Clusteringachieveshigherperformance.
21
Resultsofweightclustering
•ParametersofnetworkstrainedusingourregularizedversionofReptiledonottravelasfarduringfine-tuningatinferenceasthosetrainedusingvanillaReptile
•Fromthese,weconcludethatourregularizerdoesindeedmovemo
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年职业教育行业洞察报告及未来五至十年预测分析报告
- 健康睡眠知识讲座课件
- 健康活动托班课件下载
- 蔬菜市场供应链管理办法
- 街道老年代步车管理办法
- 2024年食品、饮料及烟草批发服务项目资金申请报告代可行性研究报告
- 西安市中介管理办法细则
- 西青区企业注册管理办法
- 证券市场绿色化管理办法
- 诸城市物业管理暂行办法
- 国内机场三字代码表
- 保险公司理赔服务手册
- 网约车修理合作协议书范文模板
- 2024年货车买卖协议范本
- 医院病案质控管理学习汇报
- GB/T 28569-2024电动汽车交流充电桩电能计量
- 静脉炎的预防和处理
- 特种设备安全管理员考试题库参考资料
- 2024年广东省惠州市惠城区小升初数学试卷
- 2024年银行外汇业务知识理论考试题库及答案(含各题型)
- 护理管道风险
评论
0/150
提交评论