基于元学习和对称性的数据高效深度学习探索 Towards data-efficient deep learning with meta-learning and symmetries

上传人：策*** IP属地：山西上传时间：2024-11-03 格式：DOCX 页数：230 大小：1.94MB 积分：19.9 举报 版权申诉

基于元学习和对称性的数据高效深度学习探索 Towards data-efficient deep learning with meta-learning and symmetries_第2页

基于元学习和对称性的数据高效深度学习探索 Towards data-efficient deep learning with meta-learning and symmetries_第3页

基于元学习和对称性的数据高效深度学习探索 Towards data-efficient deep learning with meta-learning and symmetries_第4页

基于元学习和对称性的数据高效深度学习探索 Towards data-efficient deep learning with meta-learning and symmetries_第5页

已阅读5页，还剩225页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

TowardsData-EfficientDeepLearningwithMeta-LearningandSymmetries

JinXu

BalliolCollege

UniversityofOxford

AthesissubmittedforthedegreeofDoctorofPhilosophyinStatistics

Trinity2023

Acknowledgements

Firstandforemost,Iwanttoexpressmydeepgratitudetomysupervisors,Prof.Yee

WhyeTehandDr.TomRainforth.Theirunwaveringsupport,carefulguidance,andconstantinspirationhavebeeninvaluablethroughoutmyPhDjourney.Ithasbeenaprivilegetobementoredbythem,whoIregardasresearchrolemodels.Theirdepthandbreadthofknowledgehavebeenbothhumblingandenlightening.SpecialacknowledgementgoestoYeeWhye,whohasalwaysbeenconsiderateandreadytohelpintoughtimes.MyheartfeltthanksgotoTomforhisguidanceduringthechallengingtimesbroughtonbythepandemic.

IwouldliketoextendmygratitudetoallmycollaboratorsHyunjikKim,Jean-FrancoisTon,AdamKosiorek,EmilienDupont,andKasparMärtens.TheirexpertiseandfeedbackhavebeencrucialinimprovingmyworkandIlearnagreatdealfromthem.AbigthankyoutoProf.RyanAdamsfromPrincetonUniversityandtomyinternshiphosts,JamesHensmanandMaxCrociatMicrosoftResearch.TheirmentorshipoutsideofmyPhDlifehasbeenanindispensablepartofmyresearchexperience.

Moreover,Ifeelextremelyfortunatetobesurroundedbyamazingandcaringfriendswhosenamesarenotpossibletoenumeratehere.AmongthemareEmilienDupont,Jean-FrancoisTon,CharlineLeLan,BobbyHe,SheheryarZaidi,QinyiZhang,GuneetDhillon,AndrewCampbell,ChrisWilliams,CarloAlfano,FaaizTaufiq,AnnaMenacherandothersfromourlovelyoffice1.17,HanwenXing,YanzhaoYang,NingMiao,ChaoZhang,Yutonglu,YixuanHe,XiLin,YuanZhou,FanWu,BohaoYaofromthedepartmentofstatistics,DunhongJin,SihanZhou,SijiaYao,HuiningYang,KevinWang,NataliaHong,HangYuan,KangningZhang,ChengyangWangandmanyothersfromotherdepartmentsatOxford,DenizOktay,SulinLiu,JennyZhanandothersfromPrincetonUniversity,internshippeersatMicrosoftResearchincludingAlexanderMeulemans,SalehAshkboosfromETH.

Aspecialthankstoalluniversityanddepartmentstaff,especiallyChrisCullenforhiskindandpatientsupportduringdifficulttimes,andtoJoannaStoneham,Stuart

McRobert,andotherswhoensuredasmoothPhDexperience.

Finally,aboveall,mydeepestthanksgotoYifanYuforherloveandcompanionship.SheimmenselyenrichedmytimeinOxford,bringingcolourandjoytomylife.Additionally,IameternallygratefultomyparentsChengxiangXuandFengChenforgivingmethefreedomtopursuemypassionsandfortheirunquestioningsupportthroughoutthisjourney.

Abstract

Recentadvancesindeeplearninghavebeensignificantlypropelledbytheincreasingavailabilityofdataandcomputationalresources.Whiletheabundanceofdataenablesmodelstoperformwellincertaindomains,therearereal-worldapplications,suchasinthemedicalfield,wherethedataisscarceordifficulttocollect.Furthermore,therearealsoscenarioswherethelargedatasetisbetterviewedaslotsofrelatedsmalldatasets,andthedatabecomesinsufficientforthetaskassociatedwithoneofthesmalldatasets.Itisalsonoteworthythathumanintelligenceoftenrequiresonlyahandfulofexamplestoperformwellonnewtasks,emphasizingtheimportanceofdesigningdata-efficientAIsystems.Thisthesisdelvesintotwostrategiestoaddressthischallenge:meta-learningandsymmetries.Meta-learningapproachesthedata-richenvironmentasacollectionofmanysmall,individualdatasets.Eachofthesesmalldatasetsrepresentsadistincttask,yetthereisunderlyingsharedknowledgebetweenthem.Harnessingthissharedknowledgeallowsforthedesignoflearningalgorithmsthatcanefficientlyaddressnewtaskswithinsimilardomains.Incomparison,symmetryisaformofdirectpriorknowledge.Byensuringthatmodels’predictionsremainconsistentdespiteanytransformationtotheirinputs,thesemodelsenjoybettersampleefficiencyandgeneralization.

Inthesubsequentchapters,wepresentnoveltechniquesandmodelswhichallaimatimprovingthedataefficiencyofdeeplearningsystems.Firstly,wedemonstratethesuccessofencoder-decoderstylemeta-learningmethodsbasedonConditionalNeuralProcesses(cnps).Secondly,weintroduceanewclassofexpressivemeta-learnedstochasticprocessmodelswhichareconstructedbystackingsequencesofneuralparameterisedMarkovtransitionoperatorsinfunctionspace.Finally,weproposegroupequivariantsubsampling/upsamplinglayerswhichtacklesthelossofequivarianceinconventionalsubsampling/upsamplinglayers.Theselayerscanbeusedtoconstructend-to-endequivariantmodelswithimproveddata-efficiency.

Contents

1Introduction

1.1Motivation

1.2Thesisoutline

1.3Papers

2Background

2.1Meta-learning

2.1.1Conventionalsupervisedlearningandmeta-learning

2.1.2Differentviewsofmeta-learning

2.1.3Commonapproachestometa-learning

2.2Neuralprocesses

2.2.1Stochasticprocesses

2.2.2Neuralprocessesasstochasticprocesses

2.2.3Neuralprocesstrainingobjectives

2.2.4Ameta-learningperspective

2.3Symmetriesindeeplearning

2.3.1Group,cosetandquotientspace

2.3.2Grouphomomorphism,groupactionsandgroupequivariance

.16

2.3.3Homogeneousspacesandliftingfeaturemaps

2.3.4FeaturemapsinG-CNNs

2.3.5Groupequivariantneuralnetworks

3MetaFun:Meta-LearningwithIterativeFunctionalUpdates

3.1Introduction

3.2MetaFun

3.2.1Learningfunctionaltaskrepresentation

3.2.2MetaFunforregressionandclassification

3.3Relatedwork

3.4Experiments

3.4.11-Dfunctionregression

3.4.2Classification:miniImageNetandtieredImageNet

3.4.3Ablationstudy

3.5Conclusionsandfuturework

3.6Supplementarymaterials

3.6.1Functionalgradientdescent

ReproducingkernelHilbertspace

Functionalgradients

Functionalgradientdescent

3.6.2Experimentaldetails

4DeepStochasticProcessesviaFunctionalMarkovTransitionOpera-

tors

4.1Introduction

4.2Background

4.3Markovneuralprocesses

4.3.1AmoregeneralformofNeuralProcessdensityfunctions

4.3.2Markovchainsinfunctionspace

4.3.3Parameterisation,inferenceandtraining

4.4Relatedwork

4.5Experiments

4.5.11Dfunctionregression

4.5.2Contextualbandits

4.5.3Geologicalinference

4.6Discussion

4.7Supplementarymaterials

4.7.1Proofs

4.7.2Implementationdetails

4.7.3Data

Modelarchitecturesandhyperparameters

Computationalcostsandresources

4.7.4Broaderimpacts

iii

5GroupEquivariantSubsampling

5.1Introduction

5.2Equivariantsubsamplingandupsampling

5.2.1TranslationequivariantsubsamplingforCNNs

5.2.2Groupequivariantsubsamplingandupsampling

5.2.3ConstructingΦ

5.3Application:Groupequivariantautoencoders

5.4Relatedwork

5.5Experiments

5.5.1Basicproperties:Equivariance,disentanglementandout-of-

distributiongeneralization

5.5.2Singleobject

5.5.3Multipleobjects

5.6Conclusions,limitationsandfuturework

5.7Supplementarymaterials

5.7.1Equivariantsubsamplingandupsampling

ConstructingΦ

Multiplesubsamplinglayers

5.7.2Groupequivariantautoencoders

5.7.3Proofs

5.7.4Implementationdetails

Data

Modelarchitectures

Hyperparameters

Computationalresources

6ConclusionsandFutureOutlook

Bibliography

Chapter1

Introduction

1.1Motivation

Recentbreakthroughsindeeplearningcanbelargelyattributedtothevastamountofdataavailableandtheadvancementofcomputationalresources[

Dengetal.,

2009,

Rainaetal.,

2009,

Silveretal.,

2016,

Jumperetal.,

2021,

Brownetal.,

2020a]

.Whiletrainingonlargedatasetsenablesdeeplearningmodelstoexcelincertaintasks,manyreal-worldapplicationsonlyprovidelimiteddataforaspecifictask.Forinstance,inmedicalfields,obtainingdata,especiallyforrarediseases,ischallengingandoftenexpensive.Indrugdevelopmentorrecommendationsystems,therewillalwaysbeinsufficientdatafornewdrugs/users,eventhoughabundantdataexistsforotherdrugsorusers.Therefore,toapplydeeplearningtothesefields,itisvitaltodevelopsystemsthataredata-efficient.Moreover,foradvancedAIsystems,data-efficiencycanbeacrucialingredient:Firstly,AIsystemsshouldbeabletogeneralizebeyondspecificdatadistributionswithoutrelyingondata;forinstance,animagerecognitionsystemshouldrecognizeobjectsregardlessoftheirpositionororientation.Secondly,humanintelligencecanoftensolvenewtaskswithjustafewexamples.Thus,forAItoemulatehuman-likeintelligence,itshouldalsohavesuchcapability.

FromaBayesianperspective,learninginvolvesupdatingourbeliefsaboutamodel(representedbyθ)giventhedata,i.e.p(θ|Ddata).Foramodeltolearnefficientlyfromasmallamountofdata,it’simportanttostartwithagoodinitialguessor"prior"p(θ).Inthispaper,welookattwodirectionstoobtainsuchpriorfordata-efficientlearning:Thefirstismeta-learning,whichlearnstheprior(orthesharedknowledge)from

similartasks.Itcanbeunderstoodas"learningtolearnmoreefficiently".Thesecondissymmetriesindeeplearning,whichservesasaknownpriorforcertainproblems.Symmetry,afundamentalconceptinphysics,representsaformofpriorknowledgethatisubiquitouslyobservedthroughoutourphysicalworld.

Meta-learningtacklesaspecificscenarioinwhichthevastpoolofdatacanbeviewedasmanysmalldatasets,eachrepresentingadistincttask.Yet,thesetaskscontainunderlyingsharedknowledgethatcanbeharnessedtoaddressnewtaskswithinthesamecategory.Thisscenarioisprevalentinmanyapplications.Take,forinstance,anonlineretailcompanywithdatafromcustomersworldwide.Thedataassociatedwitheachuseristypicallysparse.Inthiscontext,predictingbehavioursforeachuserconstitutesanindividualtask,butpatternsamongdifferentusersoftenexhibitsimilarities.Meta-learningalgorithmsaredesignedtohandlesuchcircumstances.Thegoalofmeta-learningistolearndata-efficientlearningalgorithmsthatcanlaterbeappliedtoaparticulartask.Thetrainingdataformeta-learningcomprisesnumerousrelatedtasks,eachwithalimitedsetofdatapoints.Afterthemeta-learningphase,thelearnedlearningalgorithmscansolveanewtaskinadata-efficientmanner.Incontrast,theaimofconventionalsupervisedlearningisjusttolearnapredictivemodel.

Meta-learningproblemscanbetackledfromvariousperspectives,andtheseap-proachescanbeunderstoodthroughdifferentviewpointssuchasoptimization-basedap-proaches[

RaviandLarochelle,

2016,

Finnetal.,

2017a

],metric-basedapproaches[

Koch,

2015

Vinyalsetal.,

2016,

Sungetal.,

2018,

Snelletal.,

2017],andmodel-based

approaches[

Santoroetal.,

2016,

Mishraetal.,

2018,

Garneloetal.,

2018a

],amongothers.Notethattheseviewsarenotexclusive.Forexample,methodssuchasprototypicalNetworks[

Snelletal.,

2017

],MAML[

Finnetal.,

2017a

],ML-PIP[

Gordon

etal.

2018

]etc.canbereformulatedunderamodel-basedframeworkthatusesanencoder-decodersetup.Inthissetup,theencoderproducesataskrepresentationusingtrainingdata,andthedecoderthenmakespredictionsbasedonthetaskrep-resentation.Theseapproachestransformthemeta-learningchallengetoresemblearegularlearningprobleminvolvingsequences,anditisalsomorecomputationallyefficientifnogradientcomputationisinvolvedinboththeencoderandthedecoderlikecnp-typemodels[

Garneloetal.,

2018a]

.OurstudyinChapter

explicitlyadoptsthisencoder-decoderframeworkformeta-learning.Byusingafunctionaltaskrepresentation,anditerativelyupdatingtherepresentationdirectlyinfunctionspace,

wedemonstratethatencoder-decoderapproacheswithoutgradientinformationcanalsobecompetitivewithotherapproaches,whichhasnotbeenshownbefore.

Furthermore,becausetrainingdataforeachtaskinmeta-learningisoftenlimited,uncertaintyestimationbecomescrucial.StochasticProcesses(sps)(e.g.GaussianProcesses(gps))canbeusedtomakepredictionswithuncertaintyestimation.Thus,learningtheseprocessescanbeseenasawaytoapproachmeta-learningwithuncer-taintyinmind.InChapter

,weproposeanewframeworktoconstructexpressiveneuralparameterisedspsbyparameterisingMarkovtransitionsinfunctionspace.

Unlikemeta-learningabove,whichdiscoverssharedknowledgefromrelatedtasks,symmetryservesasadirectformofpriororinductivebias,integratedintodeeplearningmodelswithouttheneedforpre-training.Symmetriesrefertotransformationsthatmaintaincertainpropertiesofanobjectofinterestunchanged.Theseincludetransformationssuchasimagetranslation,rotation,orpermutationofsetelements.Byincorporatingthesesymmetriesintodeeplearningmodels,ensuringthattheoutputsremainconsistent(thesameorundergothecorrespondingtransformation)despiteinputtransformations,themodelinherentlygeneralizestotransformedinputs.Consequently,deeplearningmodelsequippedwiththesesymmetriesnotonlybecomemoredata-efficientbutalsogeneralizebetter.AsimpleexampleofthisisConvolutinalNeuralNetworks(cnns),whichareinvarianttoinputtranslationsforclassificationtasks,andperformsignificantlybettercomparedtoplainfeed-forwardnetworks.Earlierresearchhasintroducedmanymethodstobuildconvolutional[

Cohenand

Welling,

2016,

2017,

Cohenetal.,

2019]andattentionblocks[Hutchinsonetal.,

2021,

Fuchsetal.,

2020

]thatareequivariantw.r.t.tovarioussymmetries.However,thepoolinglayersorsubsampling/upsamplinglayerscommonlyusedinvariousdeeplearningarchitecturesbreakthesesymmetries[

Zhang,

2019]

.InChapter

5,wepresent

groupequivariantsubsampling/upsamplinglayersthathaveexactequivariance.

1.2Thesisoutline

InChapter

,weprovideashortintroductiontometa-learning,neuralprocessesandsymmetriesindeeplearning,tosetthestageforlaterchapters.

InChapter

,weintroduceaniterativefunctionalencoder-decodermethodforsu-pervisedmeta-learning,whichisbasedonNeuralProcesses(nps)[

Garneloetal.,

2018a

,b]

.Onstandardfew-shotclassificationbenchmarkslikeminiImageNetandtieredImageNet,itisdemonstratedthatmeta-learningmethodsbasedontheneuralprocessfamilycanbecompetitiveorevenoutperformgradient-basedmethodssuchasMAML[

Finnetal.,

2017a

]andLEO[

Rusuetal.,

2019]

InChapter

,weintroduceMarkovNeuralProcesses(MNPs),anewclassofStochasticProcesses(SPs)whichareconstructedbystackingsequencesofneuralparameterisedMarkovtransitionoperatorsinfunctionspace.Therefore,theproposediterativeconstructionaddssubstantialflexibilityandexpressivitytotheoriginalframeworkofNeuralProcesses(NPs)withoutcompromisingconsistencyoraddingrestrictions.OurexperimentsdemonstrateclearadvantagesofMNPsoverbaselinemodelsonavarietyoftasks.It’snoteworthythatspmodelscanbeviewedthroughameta-learninglens.Sotheproposedmethodcanalsobeseenasameta-learningapproachwithprincipleduncertaintyestimation.

Chapter

,wefirstintroducetranslationequivariantsubsampling/upsamplinglayersthatcanbeusedtoconstructexacttranslationequivariantCNNs.Wethengeneralisetheselayersbeyondtranslationstogeneralgroups,thusproposinggroupequivariantsubsampling/upsampling.Weusetheselayerstoconstructgroupequivariantautoen-coders(GAEs)thatallowustolearnlow-dimensionalequivariantrepresentations.Weempiricallyverifyonimagesthattherepresentationsareindeedequivarianttoinputtranslationsandrotations,andthusgeneralisewelltounseenpositionsandorienta-tions.WefurtheruseGAEsinmodelsthatlearnobject-centricrepresentationsonmulti-objectdatasets,andshowimproveddataefficiencyanddecompositioncomparedtonon-equivariantbaselines.

InChapter

,wesummarizeourfindingsandexplorepotentialavenuesforfutureresearchtofurtheradvancethefield.

1.3Papers

Thisisanintegratedthesisandincludesthefollowingpublishedpapers:Chapter3contains:

Xu,J.,Ton,J.F.,Kim,H.,Kosiorek,A.,&Teh,Y.W.Metafun:Meta-

learningwithiterativefunctionalupdates.InternationalConferenceon

MachineLearning(ICML),2020[

Xuetal.,

2020]

Chapter4contains:

Xu,J.,Kim,H.,Rainforth,T.,&Teh,Y.(2021).Groupequivariantsub-sampling.AdvancesinNeuralInformationProcessingSystems(NeurIPS),2021[

Xuetal.,

2021]

Chapter5contains

Xu,J.,Dupont,E.,Märtens,K.,Rainforth,T.,&Teh,Y.W.(2023).DeepStochasticProcessesviaFunctionalMarkovTransitionOperators.AdvancesinNeuralInformationProcessingSystems(NeurIPS),2023[

etal.

2023]

Chapter2

Background

2.1Meta-learning

2.1.1Conventionalsupervisedlearningandmeta-learning

Inconventionalsupervisedlearning,theobjectiveistolearnafunctionfthatmapsaninputfeaturevectorx∈Xtoanoutputlabely∈Y.Learningisbasedonexampleinput-outputpairsinatrainingsetDtrain={(xi,yi.Commontypesofsupervisedlearningtasksincluderegressionwhereoutputlabelsarereal-valued,andclassificationwheretheoutputlabelsrepresentdifferentclasses.Thefunctionf,oftenreferredto

asthepredictivemodel,isamemberofahypothesisclass,H:={f|f(x;ϕ),ϕ∈Rdφ}.

Foreachtask,thereisariskfunctionℓ(y,f(x))whichmeasurespredictionerror.Asanexample,inthecontextofaregressiontask,ℓoftentakestheformofasquarederror,ℓ(y,f(x))=(y−f(x))2.Thetrainingprocessofthemodelftranslatestosolvinganoptimizationproblemdefinedasfollows:

ItiscalledempiricalriskminimizationbecausethisobjectiveisanestimationofthepopulationriskE(xi,yi)~p(x,y)[ℓ(yi,f(xi))]basedontheempiricaldistributionoftrainingdata.

Aftertraining,themodelshouldgeneralizeeffectivelywhenpresentedwithatestset,denotedasDtest={(xi,yim+1.Themodel’sperformancecanbeassessedusing

thetestrisk(f;Dtest)whichservesasanestimateoftheoverallpopulationrisk

usingunseendata.

Figure2.1:Dataforameta-classificationproblem.Boththemeta-trainingandmeta-testsetsconsistoftasks(redrectangles)andarepresumedtocomefromthesametaskdistributionp(T).Eachofthesetasksencompassesitsowntask-specifictrainingandtestsets,whicharecommonlyreferredtoasthecontext(yellowlabels)andthetarget(greylabels)respectively.

Inpractice,itiscommontohavescenarioswherelotsofsupervisedlearningtasksarerelatedtoeachother,yetthenumberofdatapointsforeachindividualtaskislimited.Meta-learningemergesasanewlearningparadigmtoaddresssuchchallenges.

Specifically,wehaveameta-trainingsetdefinedasMtrain={(Dt(a)in,Dt(s)t,ℓ(j)

andameta-testsetgivenbyMtest={(Dt(a)in,Dt(s)t,ℓ(j)M+1.Eachelementinthese

meta-datasetsisatupleconsistingofatrainingset(calledthecontext),atestset(calledthetarget)andariskfunction(typicallythesamewithinameta-dataset).This3-tuplecharacterizesataskTj(seeFigure

2.1

illustration).Insupervisedlearning,weusetrainingdatatotrainapredictivemodel,hopingitcangeneralizeacrosstheentiredatadistribution.Inmeta-learning,theassumptionisthatthereisacommontaskdistribution,denotedasp(T),fromwhichboththemeta-trainingsetandthemeta-testsetaredrawn.Meta-learningalgorithmsaimtousemeta-trainingdatatodiscoverlearningalgorithmsthatcangeneralizeacrosstheentiretaskdistribution.

Morespecifically,alearningalgorithmforasupervisedlearningtasktakesinatraining

setDtrain,ariskfunctionℓandoutputsapredictivemodel,writtenas:

=ΦALGO(Dtrain,ℓ).(2.2)

Sinceℓisusuallyfixed,wewillomitthedependencyonitinsubsequentdiscussions.Foraparticulartask,thelearningalgorithmΦALGOcanbeevaluatedbythetestriskofthelearnedpredictivemodel,denotedas:

(;Dtest).(2.3)

Meta-learningfindsalearningalgorithmbasedontasksfromthemeta-trainingsetMtrain,sothatthislearningalgorithmcanbemoreefficientlyappliedtonewtasks,andgeneralizesacrossthetaskdistributionp(T).Themeta-learningalgorithmcanberepresentedas:

ΦALGO=MetaAlgo(Mtrain).(2.4)

Toevaluatethemeta-learningalgorithm,wecancompute:

Whileitresemblesthetestlossinsupervisedlearning,theaggregatedtestriskforataskreplacesthetraditionalriskfunctionforadatapoint.

Itisworthnotingthatwhilewefocusonsupervisedlearningtaskshere,meta-learningcanbeextendedtounsupervisedlearning[

EdwardsandStorkey,

2016,

Reedetal.,

2018

Hsuetal.,

2018]orreinforcementlearning[

Wangetal.,

2016,

Finnetal.,

2017a

,b]

2.1.2Differentviewsofmeta-learning

Bi-leveloptimizationviewLetusassumeboththepredictivemodelfandthelearningalgorithmΦALGOcanbeparameterised,andtheparametersaredenotedasϕandθaccordingly.Thatistosay,thelearningalgorithmcanbewrittenas:

ϕ=ΦALGO(Dtrain;θ).(2.6)

Meta-learningcanbeformulatedasthefollowingbi-leveloptimizationproblem:

wheretask-specificparameterϕjdependsonθthroughtheinner-loopoptimization:

ϕj(θ)=ΦALGO(Dt(a)in;θ)(2.8)

Manymeta-learningalgorithmsaredevelopedbasedonthisbi-leveloptimizationview,suchas

Finnetal.

[2017a],

Nicholetal.

[2018],

RaviandLarochelle

[2016]

HierarchicalmodelviewFromaprobabilisticperspective,thegenerativeprocessforeachtaskTjcanbeexpressedas:

θ∼p(θ),ϕj∼p(ϕj|θ),yi(j)∼p(yi(j)|xi(j)ϕj,θ)(2.9)

BoththetrainingsetDt(a)inandthetestsetDt(s)tfollowthesamedistribution(as

illustratedinFigure

2.2

).Thiscanbeseenasaprobabilistichierarchicalmodelwhereθindicatesthehigh-levelglobalparametersforalltasksandϕjdenotesthelow-levellocalparametersforeachtask.Inthiscontext,meta-learningisaboutinferringθfromlotsoftasksinthemeta-trainingset,thatisp(θ|Mtrain).Learning,ontheother

hand,infersϕjgiventhetrainingsetDt(a)infortaskTj,thatisp(ϕj|θ,Dt(a)in).

(j)i

j=1,...

Figure2.2:Meta-learningashierarchicalmodels(AremakeofFigure1in

Gordon

etal.

[2018])

.Task-specificparameterϕjdependsontheglobalparameterθ.Datapointsinboththecontextandthetargethavethesamegenerativeprocess,whichdependonbothθandϕj.

Notethatp(ϕj|θ)canbeseenasapriorfortaskTjconditionedonθ.Therefore,meta-learningcanbeseenaslearninganempiricalpriorfromthemeta-trainingset.

Finnetal.

[2018],

Requeimaetal.

[2019]adoptsthisview

Model-basedviewAlearningalgorithmf=ΦALGO(Dtrain)canbeseenasafunctionthattakesintheentiretrainingsetandoutputsapredictivemodel.ThemodelisthenusedtomakepredictionsontestdatainDtest.Thelearningandpredictionprocessescanthusbeconceptualizedassequence-to-sequencemappings.Forthesakeofbrevity,let’suseaconcisenotationfordatasequences,suchasx1:n={x1,x2,...,xn}.ForaspecifictaskTj,makingpredictionsfortestsetdatapointsbasedonthosefromthetrainingsetcanbedescribedasthefollowinginferencetask

p(ym+1:n|xm+1:n,x1:m,y1:m).(2.10)

Fromthisperspective,meta-learningisaboutcreatingthisconditionalmodel.Meta-learningonlydiffersfromconventionalsupervisedlearninginthatboththeinp

人人文库> 全部分类> 应用文书 > 研究报告

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

基于元学习和对称性的数据高效深度学习探索 Towards data-efficient deep learning with meta-learning and symmetries

文档简介

温馨提示

最新文档

评论

基于元学习和对称性的数据高效深度学习探索 Towards data-efficient deep learning with meta-learning and symmetries

文档简介

温馨提示

最新文档

评论

相关文档