版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
UnravelingMeta-Learning:UnderstandingFeature
RepresentationsforFew-ShotTasks
HarichandanaVejendla
(50478049)
1
2
Definitions
•Meta-Learning:Meta-learningdescribesmachinelearningalgorithmsthatacquireknowledgeandunderstandingfromtheoutcomeofothermachinelearningalgorithms.Theylearnhowtobest
combinethepredictionsfromothermachine-learningalgorithms.
•Few-shotLearning:Few-ShotLearningisaMachineLearningframeworkthatenablesapre-trainedmodeltogeneralizeovernewcategoriesofdatausingonlyafewlabeledsamplesperclass.
•FeatureExtraction:Featureextractionisaprocessofdimensionalityreductionthatinvolvestransformingrawdataintonumericalfeaturesthatcanbeprocessed.
•Featureclustering:Featureclusteringaggregatespointfeaturesintogroupswhosemembersaresimilartoeachotherandnotsimilartomembersofothergroups.
•FeatureRepresentation:RepresentationLearningorfeaturelearningisthesubdisciplineofthe
machinelearningspacethatdealswithextractingfeaturesorunderstandingtherepresentationofadataset.
3
Introduction
•TransferLearning:Pre-trainingamodelonlargeauxiliarydatasetsandthenfine-tuningtheresultingmodelsonthetargettask.Thisisusedforfew-shotlearningsinceonlyafewdatasamplesareavailableinthetarget
domain.
•Transferlearningfromclassicallytrainedmodelsyieldspoorperformanceforfew-shotlearning.Recently,few-shotlearninghasbeenrapidlyimprovedusingmeta-learningmethods.
•Thissuggeststhatthefeaturerepresentationslearnedbymeta-learningmustbefundamentallydifferentfromfeaturerepresentationslearnedthroughconventionaltraining.
•Thispaperunderstandsthedifferencesbetweenfeatureslearnedbymeta-learningandclassicaltraining.
•Basedonthis,thepaperproposessimpleregularizersthatboostfew-shotperformanceappreciably.
4
Meta-LearningFramework
•Inthecontextoffew-shotlearning,theobjectiveofmeta-learningalgorithmsistoproduceanetworkthatquicklyadaptstonewclassesusinglittledata.
•Meta-learningalgorithmsfindparametersthatcanbefine-tunedinafewoptimizationstepsandonafewdatapointsinordertoachievegoodgeneralization.
•Thetaskischaracterizedasn-way,k-shotifthemeta-learningalgorithmmustadapttoclassifydatafromTiafterseeingkexamplesfromeachofthenclassesinTi.
Algorithm
5
6
AlgorithmDescription
•Meta-learningschemestypicallyrelyonbi-leveloptimizationproblemswithaninnerloopandanouterloop.
•Aniterationoftheouterloopinvolvesfirstsamplinga“task,”whichcomprisestwosetsoflabeleddata:thesupportdata,Tis,andthequerydata,Tiq.
•Intheinnerloop,themodelbeingtrainedisfine-tunedusingthesupportdata.
•Fine-tuningproducesnewparametersθi,thatareafunctionoftheoriginalparametersandsupportdata.
•Weevaluatethelossonthequerydataandcomputethegradientsw.r.ttheoriginalparametersθ.Weneedtounrollthefine-tuningstepsandbackpropagatethroughthemtocomputethegradients.
•Finally,theroutinemovesbacktotheouterloop,wherethemeta-learningalgorithmminimizeslossonthequerydatawithrespecttothepre-fine-tunedweights.Basemodelparametersareupdatedusingthe
gradients.
7
Meta-LearningAlgorithms
Avarietyofmeta-learningalgorithmsexist,mostlydifferinginhowtheyarefine-tunedusingthesupportdataduringtheinnerloop:
•MAML:Updatesallnetworkparametersusinggradientdescentduringfine-tuning.
•R2-D2andMetaOptNet:Last-layermeta-learningmethods(onlytrainthelastlayer).Theyfreezethefeatureextractionlayers(featureextractor’sparametersarefrozen)duringtheinnerloop.Onlythelinearclassifierlayeristrainedduringfine-tuning.
•ProtoNet:Last-layermeta-learningmethod.Itclassifiesexamplesbytheproximityoftheirfeaturestothoseofclasscentroids.Theextractedfeaturesareusedtocreateclasscentroidswhichthen
determinethenetwork’sclassboundaries.
8
Few-ShotDatasets
•Mini-ImageNet:ItisaprunedanddownsizedversionoftheImageNetclassificationdataset,
consistingof60,000,84×84RGBcolorimagesfrom100.These100classesaresplitinto64,16,and20classesfortraining,validation,andtestingsets,respectively.
•CIFAR-FSdataset:samplesimagesfromCIFAR-100.CIFAR-FSissplitinthesamewayasmini-ImageNetwith60,00032×32RGBcolorimagesfrom100classesdividedinto64,16,and20
classesfortraining,validation,andtestingsets,respectively.
ComparisonbetweenMeta-LearningandClassicalTrainingModels
•DatasetUsed:1-shotmini-ImageNet
•Classicallytrainedmodelsaretrainedusingcross-entropylossandSGD.
•Commonfine-tuningproceduresareusedforbothmeta-learnedandclassically-trainedmodelsforafaircomparison
•Resultsshowthatmeta-learningmodelsperformbetterthanclassicaltrainingmodelsonfew-shotclassification.
•Thisperformanceadvantageacrosstheboardsuggeststhatmeta-learnedfeaturesarequalitativelydifferentfromconventionalfeaturesandfundamentallysuperiorforfew-shotlearning.
9
10
ClassClusteringinFeatureSpace
MeasuringClusteringinFeatureSpace:
Tomeasurefeatureclustering(FC),weconsidertheintra-classtointer-classvarianceratio:
φi,j-featurevectorcorrespondingtodatapointinclassiintrainingdata
μi-meanoffeaturevectorsinclassi
μ-meanacrossallfeaturevectors
C-numberofclasses
N-numberofdatapointsperclass
Where,fθ(xi,j)=φi,jfθ-featureextractor
xi,j-trainingdatainclassi
Lowvaluesofthisfractioncorrespondtocollectionsoffeaturessuchthatclassesarewell-separatedandahyperplaneformedbychoosingapointfromeachoftwoclassesdoesnotvarydramaticallywiththechoiceofsamples.
WhyClusteringisimportant?
•Asfeaturesinaclassbecomespreadoutandtheclassesarebroughtclosertogether,theclassificationboundariesformedbysamplingone-shotdataoftenmisclassifylargeregions.
•Asfeaturesinaclassarecompactedandclassesmovefarapartfromeachother,theintra-classtointer-classvarianceratiodrops,andthedependenceoftheclassboundaryonthechoiceofone-shotsamplesbecomesweaker.
11
ComparingFeatureRepresentationsofMeta-LearningandClassicallyTrainedModels
•Threeclassesarerandomlychosenfromthetestset,and100samplesaretakenfromeachclass.Thesamplesarethenpassedthroughthefeatureextractor,andtheresultingvectorsareplotted.
•Becausefeaturespaceishigh-dimensional,weperformalinearprojectionontothefirsttwocomponentvectorsdeterminedbyLDA.
•Lineardiscriminantanalysis(LDA)projectsdataontodirectionsthatminimizetheintra-classtointer-classvarianceratio.
•Theclassicallytrainedmodelmashesfeaturestogether,whilethemeta-learnedmodelsdrawtheclassesfartherapart.
12
13
HyperplaneInvariance
Thisregularizerwithonethatpenalizesvariationsinthemaximum-marginhyperplaneseparatingfeaturevectorsin
oppositeclasses
HyperplaneVariationRegularizer:
DatpointsinclassA:x1,x2
DatapointsinclassB:y1,y2
fθ-featureextractor
fθ(x1)-fθ(y1):determinesthedirectionofthemaximum
marginhyperplaneseparatingthetwopointsinthefeaturespace
•Thisfunctionmeasuresthedistancebetweendistancevectorsx1−y1andx2−y2relativetotheirsize.
•Inpractice,duringabatchoftraining,wesamplemanypairsofclassesandtwosamplesfromeachclass.Then,wecomputeRHVonallclasspairsandaddthesetermstothecross-entropyloss.
•WefindthatthisregularizerperformsalmostaswellasFeatureClusteringRegularizerandconclusivelyoutperformsnon-regularizedclassicaltraining.
14
Experiments
•FeatureclusteringandHyperplanevariationvaluesarecomputed.
•Thesetwoquantitiesmeasuretheintra-classtointer-classvarianceratioandinvarianceofseparatinghyperplanes.
•Lowervaluesofeachmeasurementcorrespondtobetterclassseparation.
•OnbothCIFAR-FSandmini-ImageNet,themeta-learnedmodelsattainlowervalues,indicatingthatfeaturespaceclusteringplaysaroleintheeffectivenessofmeta-learning.
15
Experiments
•Weincorporatetheseregularizersintoastandardtrainingroutineoftheclassicaltrainingmodel.
•Inallexperiments,featureclusteringimprovestheperformanceoftransferlearningandsometimesevenachieveshigherperformancethanmeta-learning
16
WeightClustering:FindingClustersofLocalMinimaforTaskLossesinParameterSpace
•SinceReptiledoesnotfixthefeatureextractorduringfine-tuning,itmustfindparametersthatadapteasilytonewtasks.
•WehypothesizethatReptilefindsparametersthatlieveryclosetogoodminimaformanytasksandis,therefore,abletoperformwellonthesetasksafterverylittlefine-tuning.
•ThishypothesisisfurthermotivatedbythecloserelationshipbetweenReptileandconsensusoptimization.
•Inaconsensusmethod,anumberofmodelsareindependentlyoptimizedwiththeirowntask-specificparameters,andthetaskscommunicateviaapenaltythatencouragesalltheindividualsolutionsto
convergearoundacommonvalue.
17
ConsensusFormulation:
•Reptilecanbeinterpretedasapproximatelyminimizingtheconsensusformulation
•Reptiledivergesfromatraditionalconsensusoptimizeronlyinthatitdoesnotexplicitlyconsiderthequadraticpenaltytermwhenminimizingfor˜θp.
18
ConsensusOptimizationImprovesReptile
•WemodifyReptiletoexplicitlyenforceparameterclusteringaroundaconsensusvalue.
•Wefindthatdirectlyoptimizingtheconsensusformulationleadstoimprovedperformance.
•duringeachinnerloopupdatestepinReptile,wepenalizethesquareddistancefromtheparametersforthecurrenttasktotheaverageoftheparametersacrossalltasksinthecurrentbatch.
•ThisisequivalenttotheoriginalReptilewhenα=0.Wecallthismethod“Weight-Clustering.
ReptilewithWeightClusteringRegularizer
n-numberofmeta-trainingsteps
k-numberofiterationsorstepstoperformwithineachmeta-trainingstep
19
20
Resultsofweightclustering
•WecomparetheperformanceofourregularizedReptilealgorithmtothatoftheoriginalReptilemethodaswellasfirst-orderMAML(FOMAML)andaclassicallytrainedmodelofthesamearchitecture.We
testthesemethodsonasampleof100,0005-way1-shotand5-shotmini-ImageNettasks
•ReptilewithWeight-Clusteringachieveshigherperformance.
21
Resultsofweightclustering
•ParametersofnetworkstrainedusingourregularizedversionofReptiledonottravelasfarduringfine-tuningatinferenceasthosetrainedusingvanillaReptile
•Fromthese,weconcludethatourregularizerdoesindeedmovemo
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 《月有阴晴圆缺》课件
- 2025年信阳艺术职业学院马克思主义基本原理概论期末考试模拟题及答案解析(夺冠)
- 2025年上思县招教考试备考题库带答案解析(必刷)
- 2024年鄂城钢铁厂职工大学马克思主义基本原理概论期末考试题带答案解析
- 2024年聂荣县幼儿园教师招教考试备考题库带答案解析
- 2025年会同县幼儿园教师招教考试备考题库带答案解析
- 2025年和平县幼儿园教师招教考试备考题库带答案解析(必刷)
- 2024年缙云县幼儿园教师招教考试备考题库附答案解析(夺冠)
- 2025年唐县幼儿园教师招教考试备考题库含答案解析(夺冠)
- 保山市2025-2026学年(上期)高三期末考试历史试卷(含答案解析)
- 陕西省西安市工业大学附属中学2025-2026学年上学期八年级期末数学试题(原卷版+解析版)
- 电工素质培训课件
- 2026年陕西省森林资源管理局局属企业公开招聘工作人员备考题库及参考答案详解一套
- 讲解员发声技巧培训
- TCTA 011-2026 智能水尺观测系统操作规程
- 律师事务所年度业绩考核方案
- 2025年6月江苏扬州经济技术开发区区属国有企业招聘23人笔试参考题库附带答案详解(3卷)
- 四川省2025年高职单招职业技能综合测试(中职类) 护理类试卷(含答案解析)
- 三体系基础培训
- DL∕T 5210.5-2018 电力建设施工质量验收规程 第5部分:焊接
- CJJT67-2015 风景园林制图标准
评论
0/150
提交评论