版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
UnravelingMeta-Learning:UnderstandingFeature
RepresentationsforFew-ShotTasks
HarichandanaVejendla
(50478049)
1
2
Definitions
•Meta-Learning:Meta-learningdescribesmachinelearningalgorithmsthatacquireknowledgeandunderstandingfromtheoutcomeofothermachinelearningalgorithms.Theylearnhowtobest
combinethepredictionsfromothermachine-learningalgorithms.
•Few-shotLearning:Few-ShotLearningisaMachineLearningframeworkthatenablesapre-trainedmodeltogeneralizeovernewcategoriesofdatausingonlyafewlabeledsamplesperclass.
•FeatureExtraction:Featureextractionisaprocessofdimensionalityreductionthatinvolvestransformingrawdataintonumericalfeaturesthatcanbeprocessed.
•Featureclustering:Featureclusteringaggregatespointfeaturesintogroupswhosemembersaresimilartoeachotherandnotsimilartomembersofothergroups.
•FeatureRepresentation:RepresentationLearningorfeaturelearningisthesubdisciplineofthe
machinelearningspacethatdealswithextractingfeaturesorunderstandingtherepresentationofadataset.
3
Introduction
•TransferLearning:Pre-trainingamodelonlargeauxiliarydatasetsandthenfine-tuningtheresultingmodelsonthetargettask.Thisisusedforfew-shotlearningsinceonlyafewdatasamplesareavailableinthetarget
domain.
•Transferlearningfromclassicallytrainedmodelsyieldspoorperformanceforfew-shotlearning.Recently,few-shotlearninghasbeenrapidlyimprovedusingmeta-learningmethods.
•Thissuggeststhatthefeaturerepresentationslearnedbymeta-learningmustbefundamentallydifferentfromfeaturerepresentationslearnedthroughconventionaltraining.
•Thispaperunderstandsthedifferencesbetweenfeatureslearnedbymeta-learningandclassicaltraining.
•Basedonthis,thepaperproposessimpleregularizersthatboostfew-shotperformanceappreciably.
4
Meta-LearningFramework
•Inthecontextoffew-shotlearning,theobjectiveofmeta-learningalgorithmsistoproduceanetworkthatquicklyadaptstonewclassesusinglittledata.
•Meta-learningalgorithmsfindparametersthatcanbefine-tunedinafewoptimizationstepsandonafewdatapointsinordertoachievegoodgeneralization.
•Thetaskischaracterizedasn-way,k-shotifthemeta-learningalgorithmmustadapttoclassifydatafromTiafterseeingkexamplesfromeachofthenclassesinTi.
Algorithm
5
6
AlgorithmDescription
•Meta-learningschemestypicallyrelyonbi-leveloptimizationproblemswithaninnerloopandanouterloop.
•Aniterationoftheouterloopinvolvesfirstsamplinga“task,”whichcomprisestwosetsoflabeleddata:thesupportdata,Tis,andthequerydata,Tiq.
•Intheinnerloop,themodelbeingtrainedisfine-tunedusingthesupportdata.
•Fine-tuningproducesnewparametersθi,thatareafunctionoftheoriginalparametersandsupportdata.
•Weevaluatethelossonthequerydataandcomputethegradientsw.r.ttheoriginalparametersθ.Weneedtounrollthefine-tuningstepsandbackpropagatethroughthemtocomputethegradients.
•Finally,theroutinemovesbacktotheouterloop,wherethemeta-learningalgorithmminimizeslossonthequerydatawithrespecttothepre-fine-tunedweights.Basemodelparametersareupdatedusingthe
gradients.
7
Meta-LearningAlgorithms
Avarietyofmeta-learningalgorithmsexist,mostlydifferinginhowtheyarefine-tunedusingthesupportdataduringtheinnerloop:
•MAML:Updatesallnetworkparametersusinggradientdescentduringfine-tuning.
•R2-D2andMetaOptNet:Last-layermeta-learningmethods(onlytrainthelastlayer).Theyfreezethefeatureextractionlayers(featureextractor’sparametersarefrozen)duringtheinnerloop.Onlythelinearclassifierlayeristrainedduringfine-tuning.
•ProtoNet:Last-layermeta-learningmethod.Itclassifiesexamplesbytheproximityoftheirfeaturestothoseofclasscentroids.Theextractedfeaturesareusedtocreateclasscentroidswhichthen
determinethenetwork’sclassboundaries.
8
Few-ShotDatasets
•Mini-ImageNet:ItisaprunedanddownsizedversionoftheImageNetclassificationdataset,
consistingof60,000,84×84RGBcolorimagesfrom100.These100classesaresplitinto64,16,and20classesfortraining,validation,andtestingsets,respectively.
•CIFAR-FSdataset:samplesimagesfromCIFAR-100.CIFAR-FSissplitinthesamewayasmini-ImageNetwith60,00032×32RGBcolorimagesfrom100classesdividedinto64,16,and20
classesfortraining,validation,andtestingsets,respectively.
ComparisonbetweenMeta-LearningandClassicalTrainingModels
•DatasetUsed:1-shotmini-ImageNet
•Classicallytrainedmodelsaretrainedusingcross-entropylossandSGD.
•Commonfine-tuningproceduresareusedforbothmeta-learnedandclassically-trainedmodelsforafaircomparison
•Resultsshowthatmeta-learningmodelsperformbetterthanclassicaltrainingmodelsonfew-shotclassification.
•Thisperformanceadvantageacrosstheboardsuggeststhatmeta-learnedfeaturesarequalitativelydifferentfromconventionalfeaturesandfundamentallysuperiorforfew-shotlearning.
9
10
ClassClusteringinFeatureSpace
MeasuringClusteringinFeatureSpace:
Tomeasurefeatureclustering(FC),weconsidertheintra-classtointer-classvarianceratio:
φi,j-featurevectorcorrespondingtodatapointinclassiintrainingdata
μi-meanoffeaturevectorsinclassi
μ-meanacrossallfeaturevectors
C-numberofclasses
N-numberofdatapointsperclass
Where,fθ(xi,j)=φi,jfθ-featureextractor
xi,j-trainingdatainclassi
Lowvaluesofthisfractioncorrespondtocollectionsoffeaturessuchthatclassesarewell-separatedandahyperplaneformedbychoosingapointfromeachoftwoclassesdoesnotvarydramaticallywiththechoiceofsamples.
WhyClusteringisimportant?
•Asfeaturesinaclassbecomespreadoutandtheclassesarebroughtclosertogether,theclassificationboundariesformedbysamplingone-shotdataoftenmisclassifylargeregions.
•Asfeaturesinaclassarecompactedandclassesmovefarapartfromeachother,theintra-classtointer-classvarianceratiodrops,andthedependenceoftheclassboundaryonthechoiceofone-shotsamplesbecomesweaker.
11
ComparingFeatureRepresentationsofMeta-LearningandClassicallyTrainedModels
•Threeclassesarerandomlychosenfromthetestset,and100samplesaretakenfromeachclass.Thesamplesarethenpassedthroughthefeatureextractor,andtheresultingvectorsareplotted.
•Becausefeaturespaceishigh-dimensional,weperformalinearprojectionontothefirsttwocomponentvectorsdeterminedbyLDA.
•Lineardiscriminantanalysis(LDA)projectsdataontodirectionsthatminimizetheintra-classtointer-classvarianceratio.
•Theclassicallytrainedmodelmashesfeaturestogether,whilethemeta-learnedmodelsdrawtheclassesfartherapart.
12
13
HyperplaneInvariance
Thisregularizerwithonethatpenalizesvariationsinthemaximum-marginhyperplaneseparatingfeaturevectorsin
oppositeclasses
HyperplaneVariationRegularizer:
DatpointsinclassA:x1,x2
DatapointsinclassB:y1,y2
fθ-featureextractor
fθ(x1)-fθ(y1):determinesthedirectionofthemaximum
marginhyperplaneseparatingthetwopointsinthefeaturespace
•Thisfunctionmeasuresthedistancebetweendistancevectorsx1−y1andx2−y2relativetotheirsize.
•Inpractice,duringabatchoftraining,wesamplemanypairsofclassesandtwosamplesfromeachclass.Then,wecomputeRHVonallclasspairsandaddthesetermstothecross-entropyloss.
•WefindthatthisregularizerperformsalmostaswellasFeatureClusteringRegularizerandconclusivelyoutperformsnon-regularizedclassicaltraining.
14
Experiments
•FeatureclusteringandHyperplanevariationvaluesarecomputed.
•Thesetwoquantitiesmeasuretheintra-classtointer-classvarianceratioandinvarianceofseparatinghyperplanes.
•Lowervaluesofeachmeasurementcorrespondtobetterclassseparation.
•OnbothCIFAR-FSandmini-ImageNet,themeta-learnedmodelsattainlowervalues,indicatingthatfeaturespaceclusteringplaysaroleintheeffectivenessofmeta-learning.
15
Experiments
•Weincorporatetheseregularizersintoastandardtrainingroutineoftheclassicaltrainingmodel.
•Inallexperiments,featureclusteringimprovestheperformanceoftransferlearningandsometimesevenachieveshigherperformancethanmeta-learning
16
WeightClustering:FindingClustersofLocalMinimaforTaskLossesinParameterSpace
•SinceReptiledoesnotfixthefeatureextractorduringfine-tuning,itmustfindparametersthatadapteasilytonewtasks.
•WehypothesizethatReptilefindsparametersthatlieveryclosetogoodminimaformanytasksandis,therefore,abletoperformwellonthesetasksafterverylittlefine-tuning.
•ThishypothesisisfurthermotivatedbythecloserelationshipbetweenReptileandconsensusoptimization.
•Inaconsensusmethod,anumberofmodelsareindependentlyoptimizedwiththeirowntask-specificparameters,andthetaskscommunicateviaapenaltythatencouragesalltheindividualsolutionsto
convergearoundacommonvalue.
17
ConsensusFormulation:
•Reptilecanbeinterpretedasapproximatelyminimizingtheconsensusformulation
•Reptiledivergesfromatraditionalconsensusoptimizeronlyinthatitdoesnotexplicitlyconsiderthequadraticpenaltytermwhenminimizingfor˜θp.
18
ConsensusOptimizationImprovesReptile
•WemodifyReptiletoexplicitlyenforceparameterclusteringaroundaconsensusvalue.
•Wefindthatdirectlyoptimizingtheconsensusformulationleadstoimprovedperformance.
•duringeachinnerloopupdatestepinReptile,wepenalizethesquareddistancefromtheparametersforthecurrenttasktotheaverageoftheparametersacrossalltasksinthecurrentbatch.
•ThisisequivalenttotheoriginalReptilewhenα=0.Wecallthismethod“Weight-Clustering.
ReptilewithWeightClusteringRegularizer
n-numberofmeta-trainingsteps
k-numberofiterationsorstepstoperformwithineachmeta-trainingstep
19
20
Resultsofweightclustering
•WecomparetheperformanceofourregularizedReptilealgorithmtothatoftheoriginalReptilemethodaswellasfirst-orderMAML(FOMAML)andaclassicallytrainedmodelofthesamearchitecture.We
testthesemethodsonasampleof100,0005-way1-shotand5-shotmini-ImageNettasks
•ReptilewithWeight-Clusteringachieveshigherperformance.
21
Resultsofweightclustering
•ParametersofnetworkstrainedusingourregularizedversionofReptiledonottravelasfarduringfine-tuningatinferenceasthosetrainedusingvanillaReptile
•Fromthese,weconcludethatourregularizerdoesindeedmovemo
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 超高清视频技术发展趋势
- 广东省深圳市南山区2023-2024学年六年级上学期英语期末试卷
- 财务工作一年绩效总结
- 《深部钻探论坛厦门》课件
- 2023年广东省韶关市公开招聘警务辅助人员辅警笔试自考题2卷含答案
- 2021年湖南省常德市公开招聘警务辅助人员辅警笔试自考题2卷含答案
- 2024年黑龙江省双鸭山市公开招聘警务辅助人员辅警笔试自考题1卷含答案
- 2022年山东省聊城市公开招聘警务辅助人员辅警笔试自考题2卷含答案
- 2024年河北省唐山市公开招聘警务辅助人员辅警笔试自考题2卷含答案
- 团支部活动策划方案
- 2024年包头职业技术学院单招职业适应性测试题库及答案1套
- 教科版小学科学四年级上册期末检测试卷及答案(共三套)
- 人教部编版八年级数学上册期末考试卷及答案一
- 养老机构安全管理培训课件
- (附答案)2024公需课《百县千镇万村高质量发展工程与城乡区域协调发展》试题广东公需科
- 安徽省芜湖市2023-2024学年高一上学期1月期末英语试题
- 有门摄影课智慧树知到期末考试答案2024年
- 临床试验观察表(CRF)
- (正式版)JBT 11880.13-2024 柴油机 选择性催化还原(SCR)系统 第13部分:催化剂分子筛
- 2024年江苏宿迁永泽福寿园殡葬服务有限公司招聘笔试参考题库含答案解析
- 铁路职业规划
评论
0/150
提交评论