东南大学－自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends

上传人：策*** IP属地：山西上传时间：2023-02-03 格式：DOCX 页数：50 大小：796.43KB 积分：19.9 举报 版权申诉

东南大学－自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第2页

东南大学－自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第3页

东南大学－自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第4页

东南大学－自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第5页

已阅读5页，还剩45页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

arXivv[cs.LG]13Jan2023JOURNALOFLATEXCLASSFILESVOLarXivv[cs.LG]13Jan2023yndDD 1INTRODUCTIONEEPsupervisedlearningalgorithmshaveachievedsat-tervision(CV)andnaturallanguageprocessing(NLP).Gen-erally,supervisedlearningalgorithmsneedlargenumbersoflabeledexamplestoobtainbetterperformance.Mod-elstrainedonlarge-scaledatabasessuchasImageNetarewidelyutilizedaspretrainedmodelsandthenﬁne-tunedforothertasks(Table1)duetothefollowingtwomainreasons.First,theparameterslearnedondifferentlarge-scaledatabasesprovideagoodstartingpoint.Thus,net-workstrainedonothertaskscanconvergemorequickly.eadylearnedtherelevanthierarchycharacteristicswhichcanhelplessenoverﬁttingproblemduringthetrainingprocessesofothertasks,EEPsupervisedlearningalgorithmshaveachievedsat-UniversityandwithPurpleMountainLaboratories,Nanjing210000,swiththeCenterforResearchonIntelligentPerceptionandl◆instancesinothertasksaresmallorthetraininglabelsarelimited.Unfortunately,inmanyrealdataminingandmachinelearningapplications,althoughmanyunlabeledtraininginstancescanbefound,usuallyonlyalimitednumberoflabeledtraininginstancesareavailable.Labeledexamplesareoftenexpensive,difﬁcult,ortime-consumingtoobtainsincetheyrequiretheeffortsofexperiencedhumanannota-tors.Forinstance,inwebuserproﬁleanalysis,itiseasytocollectmanywebuserproﬁles,butlabelingthenon-proﬁtableusersorproﬁtableusersinthesedatarequiresin-spection,judgment,andeventime-consumingtracingtaskstobeperformedbyexperiencedhumanassessors,whichisveryexpensive.Asanothercase,inthemedicalﬁeld,unlabeledexamplescanbeeasilyobtainedfromroutinemedicalexaminations.However,makingdiagnosesforsomanyexamplesinacase-by-casemannerimposesaheavyburdenonmedicalexperts.Forexample,toperformbreastcancerdiagnosis,radiologistsmustassignlabelstoeveryfocusinalargenumberofeasilyobtainedhigh-resolutionmammograms.Thisprocessisoftenveryinefﬁcientandtime-consuming.Furthermore,supervisedlearningmeth-odssufferfromspuriouscorrelationsandgeneralizationerrors,andtheyarevulnerabletoadversarialattacks.Toalleviatethetwoaforementionedlimitationsofsu-pervisedlearning,manymachinelearningparadigmshavebeenproposed,suchasactivelearning,semi-supervisedlearningandself-supervisedlearning(SSL).Thispaperfo-cusesonSSL.SSLalgorithmshavebeenproposedtolearngoodfeaturesfromalargenumberofunlabeledinstanceswithoutusinganyhumanannotations.ThegeneralpipelineofSSLisshowninFig.1.Duringtheself-supervisedpretrainingphase,apredeﬁnedpretextLabeledDataTransferSSLTasksJOURNALOFLATEXCLASSFILESVOLNOAUGUST2LabeledDataTransferSSLTasksPretrainingPretrainingTasksDownstreamTasksedtaimagecategorizationdetection/segmentation/poseestimation/depthestimation,etcvideoactioncategorizationactionrecognition/objecttracking,etcveunlabeleddatamagerotationjigsawetcdetection/segmentation/poseestimation/depthestimation,etcVideo:theorderofframes,playingdirection,etcactionrecognition/objecttracking,etcNLP:maskedlanguagemodelingquestionanswering/textualentailmentrecognition/naturallanguageinference,etc.TABLE1:Contrastbetweensupervisedandself-supervisedpretrainingandﬁne-tuning.UnlabeledUnlabeledDataInitializationInitializationDownstreaDownstreamTasksofSSL4432.52610.6Fig.2:GoogleScholarsearchresultsfor“self-supervisedlearning”.Theverticalandhorizontalaxesdenotethenum-fSSLpublicationsandtheyearrespectivelytaskisdesignedforadeeplearningalgorithmtosolve,andthepseudolabelsforthepretexttaskareautomaticallygen-eratedbasedoncertainattributesoftheinputdata.Then,thedeeplearningalgorithmistrainedtolearntosolvethepretexttask.Aftertheself-supervisedpretrainingprocessiscompleted,thelearnedmodelcanbefurthertransferredtodownstreamtasks(especiallywhenonlyarelativelysmallnumberofexamplesareavailable)asapretrainedmodeltoimproveperformanceandovercomeoverﬁttingissues.Becausenohumanannotationsarerequiredtogener-atepseudolabelsduringself-supervisedtraining,onemainmeritofSSLalgorithmsisthattheycanmakethemostofself-supervisedalgorithmshaveachievedpromisingresults,andtheperformancegapbetweenself-supervisedandsu-pervisedalgorithmsindownstreamtaskshasdecreased.Asanoetal.[1]showedthatevenononlyasingleimage,SSLcansurprisinglyproducelow-levelcharacteristicsthatgeneralizewell.SSL[2]–[19]hasrecentlyattractedincreasingattentionFigYannLeCunoneoftherecipientsoftheACMA.M.TuringAward,gaveakeynotetalkattheEighthIn-ternationalConferenceonLearningRepresentations(ICLR2020),andthetitleofhistalkwas“Thefutureisself-supervised”.YannLeCunandYoshuaBengio,whobothreceivedtheTuringaward,saidthatSSLiskeytohuman-levelintelligence[20].AccordingtoGoogleScholar,alargenumberofpapersrelatedtoSSLhavealreadybeenpub-ForexampleapproximatelypapersrelatedtoSSLwerepublishedin2021,constitutingapproximately52paperseverydayormorethantwopapersperhour(Fig.2).TopreventtheresearchersfrombecominglostinsomanySSLpapersandtocollatethelatestresearchﬁndings,weattempttoprovideatimelysurveyofthistopic.DifferencesFromPreviousWork:ReviewsofSSLareavailableforspeciﬁcapplicationssuchasrecommendersystems[21],graphs[22],[23],sequentialtransferlearning[24],videos[25],adversarialpretrainingofself-superviseddeepnetworks[26],andvisualfeaturelearning[27].Liuetal.[18]mainlycoveredpaperswrittenbefore2020,andtheirworkdidnotcontainthelatestprogress.Jaiswaletal.[28]focusedoncontrastivelearning(CL).SSLresearchbreakthroughsinCVhavebeenachievedinrecentyears.Inthiswork,wethereforemainlyincludeSSLresearchderivedfromtheCVcommunityinrecentyears,especiallyclassicandinﬂuentialresearchresults.TheobjectivesofthisreviewaretoexplainwhatSSLis,itscategoriesandsubcategories,howitdiffersandrelatestoothermachinelearningparadigms,anditstheoreticalunderpinnings.Wepresentanup-to-dateandcomprehensivereviewofthefrontiersofvisualSSLanddividevisualSSLintothreeparts:context-based,contrastive,andgenerativeSSL,inthehopeofsortingthetrendsforresearchers.Theremainderofthispaperisorganizedasfollows.SectionsintroduceSSLfromtheperspectivesofitsalgorithms,theory,applications,threemaintrends,open2ALGORITHMSInthissection,weﬁrstintroducewhatSSLis.Then,weintroducethepretexttasksofSSLanditscombinationswithotherlearningparadigms.JOURNALOFLATEXCLASSFILESVOLNOAUGUST3ALGORITHMSWhatisSSL?PretexttasksContext-basedmethodsCLGenerativealgorithmsCombinationswithotherlearningparadigmsGenerativeadversarialnetworks(GANs)Semi-supervisedlearningMulti-instancelearningMulti-view/Multi-modal(ality)learningTesttimetrainingEORYvealgorithmsMaximumlikelihoodestimation(MLE)TheoriginalGANsInfoGAN’sdisentanglingabilityDenoisingautoencoder(DAE)ContrastiveConnectiontootherunsupervisedlearningalgorithmsConnectiontosupervisedlearningConnectiontometriclearningUnderstandingthecontrastivelossbasedonalignmentanduniformityTherelationshipbetweenthecontrastivelossandmutualinformationCompletecollapseanddimensionalcollapseAPPLICATIONSImageprocessingandcomputervisionNaturallanguageprocessing(NLP)OtherﬁeldsMAINTheoreticalanalysisofSSLAutomaticdesignofanoptimalpretexttaskAuniﬁedSSLparadigmformultiplemodalitiesCanSSLbeneﬁtfromalmostunlimiteddata?Whatisitsrelationshipwithmulti-modalitylearning?WhichSSLalgorithmisthebest/shouldIuse?Dounlabeleddataalwayshelp?TABLE2:Structureofthispaper.supervisedsupervisedselfsupervisedsupervisedderiveslabelfromco-ocurringearningunsuperdSSLTheimageisreproducedfrom2.1WhatisSSL?BeforedivingintoSSL,weﬁrstintroducetheconceptofunsupervisedlearning.Inunsupervisedlearning[29],thetrainingdataarecomposedofasetofinputvectorsxwithoutanycorrespondingtargetvalues.Representativeunsupervisedlearningalgorithmsincludeclusteringanddensityestimation.SSLwaspossiblyﬁrstintroducedin[30](Fig.3).[30]usedthisstructureinnaturalenvironmentsderivedfromdifferentmodalities.Forinstance,seeingacowandhearing“mooing”areeventsoftenoccurtogether.Thus,althoughthesightofacowdoesnotmeanthatacowlabelshouldbeascribed,itdoesco-occurwithanexampleofa“moo”.Thekeyistoprocessthecowimagetoobtainaself-supervisedlabelforthenetworksothatitcanprocessthe“moo”soundandviceversa.Sincethen,themachinelearningcommunityhasfurtherpervisedlearning.InSSL,outputlabelscanbe‘intrinsically’generatedfromtheinputdataexamplesbyexposingtherelationsbetweenpartsofthedataordifferentviewsofthedata.Theoutputlabelsaregeneratedfromthedataexamplesthemselves.Fromthisdeﬁnition,anautoencoder(AE)maybeseenasonekindofSSLalgorithmsinwhichtheoutputlabelsarethedatathemselves.AEshavebeenwidelyusedinmanyareas,suchasdimensionalityreductionandanomalydetection.InYannLeCun’skeynotetalkatICLR2020,SSLwasdescribedasequaltoﬁllingintheblanks(reconstruction),andhegaveseveralformsofSSL(Fig.4),whichareshownasfollows.1)Predictanypartoftheinputfromanyotherpart.2)Predictthefuturefromthepast.3)Predicttheinvisiblefromthevisible.4)Predictanyoccluded,masked,orcorruptedpartfromallavailableparts.InSSL,apartoftheinputisunknown,andthegoalistopredictthatpart.Jingetal.[27]furtherextendedthemeaningofSSLasfollows.Ifamethoddoesnotinvolveanyhuman-annotatedlabels,themethodfallsintoSSL.Inthisway,SSLisequaltounsupervisedlearning.Therefore,generativeadversarialnetworks(GANs)[31]belongtoSSL.AnimportantconceptintheﬁeldofSSListheideaofpretext(alsoknownassurrogateorproxy)tasks.Theterm“pretext”meansthatthetaskbeingsolvedisnotthetruein-terestbutissolvedonlyforthegenuinepurposeofprovid-ingapromisingpretrainedmodel.Commonpretexttasksincluderotationpredictionandinstancediscrimination.Torealizedifferentpretexttasks,differentlossfunctionsareintroduced.AsthemostimportantconceptinSSL,weﬁrstintroducepretexttasksbelow.OLNOAUGUSTtimeorspace-Fig.4:SSL.ThisﬁgureisreproducedfromYannLeCun’skeynotetalkatICLR2020.Theredpartisknown,andtheotherpartisunknown.2.2PretexttasksInthissection,wesummarizethepretexttasksofSSL.ApopularSSLsolutionistoproposeapretexttaskfornetworkstosolve,andthenetworksaretrainedbylearningtheobjectivefunctionsofthesepretexttasks.Pretexttaskshavetwocommoncharacteristics,asfollows.First,featuresneedtobelearnedbydeeplearningmethodstosolvethethedatathemselves(self-supervision).Existingmethodsgenerallyutilizethreetypesofpre-ethodsCLandgenerativealgorithms.Here,generativealgorithmsgenerallymeanmaskedimagemodeling(MIM)methods.odsareusuallybasedonthecontextualrelationshipsamongthegivenexamples,suchastheirspa-tialstructuresandlocalandglobalconsistency.Now,weuserotationasasimpleexampletodemon-theconceptofcontextbasedpretexttasksThenwegraduallyintroduceothertasks.Theparadigmthatrotationfollowsinvolveslearningimagerepresentationsbytrainingneuralnetworks(NNs)torecog-nizethegeometrictransformationsappliedtotheoriginalimage.Foreachoriginalimage(see“0orotation”inFig.5),Gidarisetal.[33]createdthreerotatedimageswith90o,180o,and270orotations.Eachimagebelongedtooneoffourclasses,0o,90o,180o,or270orotation,whichweretheoutputlabelsgeneratedfromtheimagesthemselves.Morespeciﬁcally,thereisasetofK=4discretegeometrictrans-formationsG={g(.Iy)}whereg(.Iy)istheoperatorthatappliesageometrictransformationwithalabelofytoimageXtoproducethetransformedimageXy=g(XIy).Gidarisetal.usedadeepconvolutionalNN(CNN)F(.)topredictrotation;thisisafour-classcategorizationtask.TheCNNF(.)obtainsaninputimageXy*(wherey*isunknowntoF(.))andproducesaprobabilitydistributionoverallprobablegeometrictransformations:F╱Xy*Iθ、={Fy╱Xy*Iθ、},(1)whereFy╱Xy*Iθ、isthepredictedprobabilityforthege-ometrictransformationwithalabelofyandθdenotesthelearnableparametersofF(.).Intuitively,agoodCNNshouldbeabletocorrectlycategorizetheK=4classesofnaturalimages.Thus,givenasetofNtraininginstancesD={Xi},theself-supervisedtrainingobjectiveofF(.)isNi=1wherethelossfunctionloss(.)isni=1wherethelossfunctionloss(.)isKloss(Xi,θ)=_Llog(Fy(g(XiIy)Iθ)).(3)y=1In[34],therelativerotationanglewasconstrainedtobewithintherange[_30o,30o].TherotationswerebinnedintoColorization:Colorizationwasﬁrstproposedin[35],and[35]–[38]showedthatcolorizationcanbeapowerfulpretexttaskforSSL.Colorpredictionhastheadvantageouscharacteristicthatthetrainingdatacanbetotallyfree.TheLchannelofanycolorimagecanbeusedastheinputofanNNsystem,lsintheCIELabcolorutlightnesschannelXeRHxWx1,theobjectiveistopredicttheabcolorchannelsYeRHxWx2,whereHandWaretheheightandwidthdimensionality,respectively.WeuseYandtodenotethegroundtruthandthepredictedvalue,respectively.AnaturalobjectivefunctionminimizestheFrobeniusnormbetweenYand:L=_Y.(4)[35]usedthemultinomialcross-entropylossratherthan(4)predictitsabcolorchannels.Then,theLchannelandtheabcolorchannelscanbeconcatenatedtomaketheoriginalgrayscaleimagecolorful.Jigsaw:Thejigsawapproachusesjigsawpuzzlesasproxytasks.Itreliesontheintuitionthatanetworkaccomplishestheproxytasksbyunderstandingthecontextualinformationcontainedintheexamples.Morespeciﬁcally,itbreaksuppicturesintodiscretepatches,thenrandomlychangestheirpositionsandtriestorecovertheoriginalorder.[39]studiedtheeffectofscalingtwoself-supervisedmethods(jigsaw[40]–[43]andcolorization)alongthreedimensions:datasize,modelcapacity,andproblemcomplexity.Theresults[39]showedthattransferperformanceincreaseslog-linearlywiththedatasize.Therepresentationqualityalsoimproveswithhigher-capacitymodelsandincreasedproblemcom-plexity.Closelyrelatedworksto[40]include[44],[45].Thepretexttaskof[46],[47]wasaconditionalmotionprop-agationproblem.Noroozietal.[48]enforcedanadditionconstraintonthefeaturerepresentationprocess:thesumofthefeaturerepresentationsofallimagepatchesshouldOLNOAUGUSTorotationorotationFig.5:Rotation.Foreachoriginalimage(‘0orotation”),Gidarisetal.[33]createdthreerotatedimages:90o,180o,and270orotations.beapproximatelyequaltothefeaturerepresentationofthewholeimage.Manypretexttasksleadtorepresentationsthatarecovariantwithimagetransformations.[49]arguedthatsemanticrepresentationsshouldbeinvariantundersuchtransformations,andapretext-invariantrepresentationlearning(PIRL)approachthatlearnsinvariantrepresenta-ionsbasedonpretexttaskswasdeveloped2.2.2CLFollowingsimpleinstancediscriminationtasks[50]–[52],manyCL-basedSSLmethodssuchasmomentumcontrast(MoCo)v1[53],MoCov2[54],MoCov3[55],asimpleframeworkforCLofvisualrepresentations(SimCLR)v1dSimCLRvhaveemergedClassicalgorithmssuchasMoCohavepushedtheper-formanceofself-supervisedpretrainingtoalevelcompara-bletothatofsupervisedlearning,makingSSLrelevantforlarge-scaleapplicationsfortheﬁrsttime.EarlyCLapproacheswereconstructedbasedontheideaofnegativeexamples.WiththedevelopmentofCL,anumberofCLmethodsthatdonotusenegativeexampleshaveemerged.Theyfollowdifferentideas,suchasself-distillationandfeaturedecorrelation,buttheyallobeytheideaofpositiveexampleconsistency.WedescribethedifferentavailableCLmethodsbelow.NegativeexamplesbasedCLfollowsasimilarpretexttask:instancediscrimination.Thebasicideaistomakepositiveexamplesclosetoeachotherandnegativeexamplesfarfromeachotherinthelatentspace.Theexactwayinwhichrdingtothegivenmodalityandotherfactors,whichcanincludespatialandtemporalconsistencyinvideounderstandingorthecooccurrencebetweenmodalitiesinmulti-modallearn-ing.MoCo.Heetal.[53]viewedCLasadictionarylook-uptask.ConsideranencodedqueryqandseveralencodedAssumethatasinglekey(denotedask+)inthedictionarymatchesq.Acontrastiveloss[58]isafunctionwhosevalueislowifqissimilartoitspositivekeyk+anddissimilartoallothernegativekeys.Withsimilaritymeasuredbythedotproduct,onecontrastivelossfunctionformcalledInfoNCE[59]wasconsideredinMoCov1[53]:Lq=_log(5) exp(q.Lq=_log(5)Lexp(q.ki/τ),whereτdenotesthetemperaturehyperparameter.ThesumiscalculatedoveronepositiveexampleandKnegativeexamples.InfoNCEwasderivedfromnoisecontrastivees-timation(NCE)[60],whoseobjectiveisexp(q.k+/τ)+exp(q.k_/τ),exp(q.k+/τ)+exp(q.k_/τ),whereqissimilartoapositiveexamplek+anddissimilartoanegativeexamplek_.BasedonMoCov1[53]andSimCLRv1[56],MoCov2[54]usesanmultilayerperceptron(MLP)projectionheadandmoredataaugmentations.hofNinstancesanddeﬁnesacontrastivepredictiontaskonpairsofaugmentedinstancesfromtheminibatch,producing2Ninstances.SimCLRv1doesnotexplicitlysampleneg-ativeinstances.Instead,givenapositivepair,SimCLRv1treatstheother2(N_1)augmentedinstancesinthemini-batchasnegativeinstances.Letsim(u,v)=uTv\(|u||v|)bethecosinesimilaritybetweentwoinstancesuandv.Then,thelossfunctionofSimCLRv1forapositivepairofinstances(i,j)isL1[ki]exp(sim(zi,zk)/τ), exp(simL1[ki]exp(sim(zi,zk)/τ),where1[ki]e{0,1}isanindicatorfunctionequalto1iffkiandτisthetemperaturehyperparameter.Theﬁnallossiscomputedbyallpositivepairs,both(i,j)and(j,i),inthemini-batch.BothMoCoandSimCLRrequiredataaugmentationtechniquessuchascropping,resizing,andcolordistor-tion.Otheraugmentationmethodsareavailable[61].Forexample,[62]estimatedtheforegroundsaliencylevelsinimagesandcreatedaugmentationsbycopyingandpastingtheimageforegroundsontodifferentbackgrounds,suchashomogeneousgrayscaleimageswithrandomgrayscalelevels,textureimages,andImageNetimages.However,whyaugmentationhelpsandhowwecanperformmoreeffectiveaugmentationsarestillunclearandrequirefurtherstudies.CLmethodsbasedonself-distillation:Bootstrapyourownlatent(BYOL)[63]isarepresentativeself-distillationalgorithm.BYOLwasproposedforself-supervisedimagerepresentationlearningwithoutusingnegativepairs.BYOLusestwoNNs,whicharecalledonlineandtargetnetworks.SimilartoMoCo[53],BYOLupdatesthetargetnetworkwithaslow-movingaverageoftheonlinenetwork.JOURNALOFLATEXCLASSFILES,VOL.gradsimilarity&graddissimilarityshareweights encoder.....................encodersimilaritygrsimilaritypredictormovingaverageencodermovingaverageencoderencoderimageencoderSimCLRgradsimilarigradencoderencodergrad predictorencoderencoderimageBYOLsimilarity encoderimageSwAVimageSimSiamisreproducedfrom[65].SiamesenetworkssuchasSimCLR,BYOL,andSwAV[64]havebecomecommonstructuresinvariousrecentlydevelopedmodelsforself-supervisedvisualrepresentationlearning.Thesemodelsmaximizethesimilaritybetweentwoaugmentationsofoneimage;theyaresubjecttocertainconditionstopreventcollapsingsolutions.[65]proposedsimpleSiamese(SimSiam)networksthatcanlearnusefulrepresentationswithoutusingthefollow-ing:negativesamplepairs,largebatches,andmomentumencoders.Foreachdatapointx,wehavetworandomlyaugmentedviewsx1andx2.AnencoderfandanMLPpredictionheadhareusedtoprocessthetwoviewsDenot-ingthetwooutputsbyp1=h(f(x1))andz2=f(x2),[65]minimizedtheirnegativecosinesimilarity|p1|2|z2|2,D(p1,z2)|p1|2|z2|2,where||2isthel2-norm.Similarto[63],[65]deﬁnedaclossasL=(D(p1,z2)+D(p2,z1)),(9)wherethislossisdeﬁnedbasedontheexamplexandthetotallossistheaverageofallexamples.Moreimportantly,[65]usedastop-gradient(stopgrad)operationbyrevisingasfollows:Dpstopgradz(10)whichmeansthatz2isseenasaconstant.Analogously,(9)L=(D(p1,stopgrad(z2))+D(p2,stopgrad(z1))).(11)SiamareshowninFig.6.SinceBYOLandSimSiamdonotusenegativeexamples,whethertheybelongtoCLisbelongtoCLinthispaper.CLmethodsbasedonfeaturedecorrelation:Featuredecorrelationaimstolearndecorrelatedfeatures.Barlowtwins.Barlowtwins[67]wereproposedwithanovellossfunction;theymaketheembeddingvectorsofdistortedversionsofanexamplesimilarwhileminimizingtheredundancybetweenthecomponentsofthesevectors.Morespeciﬁcally,similartootherSSLmethods[53],[56],BarlowtwinsproducetwoviewsforallimagesofabatchXsampledfromadatabaseandﬁnallyproducebatchesofem-beddingsZAandZB,respectively.TheobjectivefunctionofBarlowtwinsisLBT=L(1_Cii)2+λLLC,(12)iijiationmatrixcomputedbetweentheoutputsoftwoequivalentnetworksalongthebatchdimension:Cij=(13) LbzCij=(13)│Lb╱zi、2│Lb╱zj、2,wherebisthebatchexampleindexandi,jisthevectordimensionindexofthenetworkoutputs.Cisasquarema-trixwithasizeequaltothedimensionalityofthenetworkoutput.lartoBarlowtwins[67],variance-invariance-covariancereg-twinsconsideracross-correlationmatrix,whileVICRegconsidersvariance,invariance,andcovariance.Letd,n,andzdenotethedimensionalityofthevectorsinZA,thebatchsize,andthevectorconsistingofeveryvaluewithdimensionalityjamongallexamplesofZA,respectively.ThevarianceregularizationtermvofVICRegisdeﬁnedasahingelossfunctiononthestandarddeviationoftheembeddingsalongthebatchdimension:v╱ZA、=max(0,γ_S╱z,ε、),(14)whereSistheregularizedstandarddeviation,whichisdenedasS(y,ε)=│Var(y)+ε,(15)whereγisaconstantforthestandarddeviation,whichissetto1intheexperiments,andεisasmallscalarforpreventingnumericalinstabilities.Thiscriterionencouragesthanγforeverydimension,preventingcollapseincaseswherealldataaremappedtothesamevector.TheinvariancecriterionsofVICRegbetweenZAandZBisdeﬁnedasthemean-squaredEuclideandistancebe-tweeneachpair

人人文库> 全部分类> 行业资料 > 管理策划

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

东南大学－自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends

文档简介

温馨提示

最新文档

评论

东南大学 －自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends

文档简介

温馨提示

最新文档

评论

相关文档

东南大学－自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends