东南大学 -自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第1页
东南大学 -自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第2页
东南大学 -自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第3页
东南大学 -自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第4页
东南大学 -自监督学习算法、理论与应用 A Survey of Self-Supervised Learning from Multiple Perspectives Algorithms,Theory,Applications and Future Trends_第5页
已阅读5页,还剩45页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

arXivv[cs.LG]13Jan2023JOURNALOFLATEXCLASSFILESVOLarXivv[cs.LG]13Jan2023yndDD 1INTRODUCTIONEEPsupervisedlearningalgorithmshaveachievedsat-tervision(CV)andnaturallanguageprocessing(NLP).Gen-erally,supervisedlearningalgorithmsneedlargenumbersoflabeledexamplestoobtainbetterperformance.Mod-elstrainedonlarge-scaledatabasessuchasImageNetarewidelyutilizedaspretrainedmodelsandthenfine-tunedforothertasks(Table1)duetothefollowingtwomainreasons.First,theparameterslearnedondifferentlarge-scaledatabasesprovideagoodstartingpoint.Thus,net-workstrainedonothertaskscanconvergemorequickly.eadylearnedtherelevanthierarchycharacteristicswhichcanhelplessenoverfittingproblemduringthetrainingprocessesofothertasks,EEPsupervisedlearningalgorithmshaveachievedsat-UniversityandwithPurpleMountainLaboratories,Nanjing210000,swiththeCenterforResearchonIntelligentPerceptionandl◆instancesinothertasksaresmallorthetraininglabelsarelimited.Unfortunately,inmanyrealdataminingandmachinelearningapplications,althoughmanyunlabeledtraininginstancescanbefound,usuallyonlyalimitednumberoflabeledtraininginstancesareavailable.Labeledexamplesareoftenexpensive,difficult,ortime-consumingtoobtainsincetheyrequiretheeffortsofexperiencedhumanannota-tors.Forinstance,inwebuserprofileanalysis,itiseasytocollectmanywebuserprofiles,butlabelingthenon-profitableusersorprofitableusersinthesedatarequiresin-spection,judgment,andeventime-consumingtracingtaskstobeperformedbyexperiencedhumanassessors,whichisveryexpensive.Asanothercase,inthemedicalfield,unlabeledexamplescanbeeasilyobtainedfromroutinemedicalexaminations.However,makingdiagnosesforsomanyexamplesinacase-by-casemannerimposesaheavyburdenonmedicalexperts.Forexample,toperformbreastcancerdiagnosis,radiologistsmustassignlabelstoeveryfocusinalargenumberofeasilyobtainedhigh-resolutionmammograms.Thisprocessisoftenveryinefficientandtime-consuming.Furthermore,supervisedlearningmeth-odssufferfromspuriouscorrelationsandgeneralizationerrors,andtheyarevulnerabletoadversarialattacks.Toalleviatethetwoaforementionedlimitationsofsu-pervisedlearning,manymachinelearningparadigmshavebeenproposed,suchasactivelearning,semi-supervisedlearningandself-supervisedlearning(SSL).Thispaperfo-cusesonSSL.SSLalgorithmshavebeenproposedtolearngoodfeaturesfromalargenumberofunlabeledinstanceswithoutusinganyhumanannotations.ThegeneralpipelineofSSLisshowninFig.1.Duringtheself-supervisedpretrainingphase,apredefinedpretextLabeledDataTransferSSLTasksJOURNALOFLATEXCLASSFILESVOLNOAUGUST2LabeledDataTransferSSLTasksPretrainingPretrainingTasksDownstreamTasksedtaimagecategorizationdetection/segmentation/poseestimation/depthestimation,etcvideoactioncategorizationactionrecognition/objecttracking,etcveunlabeleddatamagerotationjigsawetcdetection/segmentation/poseestimation/depthestimation,etcVideo:theorderofframes,playingdirection,etcactionrecognition/objecttracking,etcNLP:maskedlanguagemodelingquestionanswering/textualentailmentrecognition/naturallanguageinference,etc.TABLE1:Contrastbetweensupervisedandself-supervisedpretrainingandfine-tuning.UnlabeledUnlabeledDataInitializationInitializationDownstreaDownstreamTasksofSSL4432.52610.6Fig.2:GoogleScholarsearchresultsfor“self-supervisedlearning”.Theverticalandhorizontalaxesdenotethenum-fSSLpublicationsandtheyearrespectivelytaskisdesignedforadeeplearningalgorithmtosolve,andthepseudolabelsforthepretexttaskareautomaticallygen-eratedbasedoncertainattributesoftheinputdata.Then,thedeeplearningalgorithmistrainedtolearntosolvethepretexttask.Aftertheself-supervisedpretrainingprocessiscompleted,thelearnedmodelcanbefurthertransferredtodownstreamtasks(especiallywhenonlyarelativelysmallnumberofexamplesareavailable)asapretrainedmodeltoimproveperformanceandovercomeoverfittingissues.Becausenohumanannotationsarerequiredtogener-atepseudolabelsduringself-supervisedtraining,onemainmeritofSSLalgorithmsisthattheycanmakethemostofself-supervisedalgorithmshaveachievedpromisingresults,andtheperformancegapbetweenself-supervisedandsu-pervisedalgorithmsindownstreamtaskshasdecreased.Asanoetal.[1]showedthatevenononlyasingleimage,SSLcansurprisinglyproducelow-levelcharacteristicsthatgeneralizewell.SSL[2]–[19]hasrecentlyattractedincreasingattentionFigYannLeCunoneoftherecipientsoftheACMA.M.TuringAward,gaveakeynotetalkattheEighthIn-ternationalConferenceonLearningRepresentations(ICLR2020),andthetitleofhistalkwas“Thefutureisself-supervised”.YannLeCunandYoshuaBengio,whobothreceivedtheTuringaward,saidthatSSLiskeytohuman-levelintelligence[20].AccordingtoGoogleScholar,alargenumberofpapersrelatedtoSSLhavealreadybeenpub-ForexampleapproximatelypapersrelatedtoSSLwerepublishedin2021,constitutingapproximately52paperseverydayormorethantwopapersperhour(Fig.2).TopreventtheresearchersfrombecominglostinsomanySSLpapersandtocollatethelatestresearchfindings,weattempttoprovideatimelysurveyofthistopic.DifferencesFromPreviousWork:ReviewsofSSLareavailableforspecificapplicationssuchasrecommendersystems[21],graphs[22],[23],sequentialtransferlearning[24],videos[25],adversarialpretrainingofself-superviseddeepnetworks[26],andvisualfeaturelearning[27].Liuetal.[18]mainlycoveredpaperswrittenbefore2020,andtheirworkdidnotcontainthelatestprogress.Jaiswaletal.[28]focusedoncontrastivelearning(CL).SSLresearchbreakthroughsinCVhavebeenachievedinrecentyears.Inthiswork,wethereforemainlyincludeSSLresearchderivedfromtheCVcommunityinrecentyears,especiallyclassicandinfluentialresearchresults.TheobjectivesofthisreviewaretoexplainwhatSSLis,itscategoriesandsubcategories,howitdiffersandrelatestoothermachinelearningparadigms,anditstheoreticalunderpinnings.Wepresentanup-to-dateandcomprehensivereviewofthefrontiersofvisualSSLanddividevisualSSLintothreeparts:context-based,contrastive,andgenerativeSSL,inthehopeofsortingthetrendsforresearchers.Theremainderofthispaperisorganizedasfollows.SectionsintroduceSSLfromtheperspectivesofitsalgorithms,theory,applications,threemaintrends,open2ALGORITHMSInthissection,wefirstintroducewhatSSLis.Then,weintroducethepretexttasksofSSLanditscombinationswithotherlearningparadigms.JOURNALOFLATEXCLASSFILESVOLNOAUGUST3ALGORITHMSWhatisSSL?PretexttasksContext-basedmethodsCLGenerativealgorithmsCombinationswithotherlearningparadigmsGenerativeadversarialnetworks(GANs)Semi-supervisedlearningMulti-instancelearningMulti-view/Multi-modal(ality)learningTesttimetrainingEORYvealgorithmsMaximumlikelihoodestimation(MLE)TheoriginalGANsInfoGAN’sdisentanglingabilityDenoisingautoencoder(DAE)ContrastiveConnectiontootherunsupervisedlearningalgorithmsConnectiontosupervisedlearningConnectiontometriclearningUnderstandingthecontrastivelossbasedonalignmentanduniformityTherelationshipbetweenthecontrastivelossandmutualinformationCompletecollapseanddimensionalcollapseAPPLICATIONSImageprocessingandcomputervisionNaturallanguageprocessing(NLP)OtherfieldsMAINTheoreticalanalysisofSSLAutomaticdesignofanoptimalpretexttaskAunifiedSSLparadigmformultiplemodalitiesCanSSLbenefitfromalmostunlimiteddata?Whatisitsrelationshipwithmulti-modalitylearning?WhichSSLalgorithmisthebest/shouldIuse?Dounlabeleddataalwayshelp?TABLE2:Structureofthispaper.supervisedsupervisedselfsupervisedsupervisedderiveslabelfromco-ocurringearningunsuperdSSLTheimageisreproducedfrom2.1WhatisSSL?BeforedivingintoSSL,wefirstintroducetheconceptofunsupervisedlearning.Inunsupervisedlearning[29],thetrainingdataarecomposedofasetofinputvectorsxwithoutanycorrespondingtargetvalues.Representativeunsupervisedlearningalgorithmsincludeclusteringanddensityestimation.SSLwaspossiblyfirstintroducedin[30](Fig.3).[30]usedthisstructureinnaturalenvironmentsderivedfromdifferentmodalities.Forinstance,seeingacowandhearing“mooing”areeventsoftenoccurtogether.Thus,althoughthesightofacowdoesnotmeanthatacowlabelshouldbeascribed,itdoesco-occurwithanexampleofa“moo”.Thekeyistoprocessthecowimagetoobtainaself-supervisedlabelforthenetworksothatitcanprocessthe“moo”soundandviceversa.Sincethen,themachinelearningcommunityhasfurtherpervisedlearning.InSSL,outputlabelscanbe‘intrinsically’generatedfromtheinputdataexamplesbyexposingtherelationsbetweenpartsofthedataordifferentviewsofthedata.Theoutputlabelsaregeneratedfromthedataexamplesthemselves.Fromthisdefinition,anautoencoder(AE)maybeseenasonekindofSSLalgorithmsinwhichtheoutputlabelsarethedatathemselves.AEshavebeenwidelyusedinmanyareas,suchasdimensionalityreductionandanomalydetection.InYannLeCun’skeynotetalkatICLR2020,SSLwasdescribedasequaltofillingintheblanks(reconstruction),andhegaveseveralformsofSSL(Fig.4),whichareshownasfollows.1)Predictanypartoftheinputfromanyotherpart.2)Predictthefuturefromthepast.3)Predicttheinvisiblefromthevisible.4)Predictanyoccluded,masked,orcorruptedpartfromallavailableparts.InSSL,apartoftheinputisunknown,andthegoalistopredictthatpart.Jingetal.[27]furtherextendedthemeaningofSSLasfollows.Ifamethoddoesnotinvolveanyhuman-annotatedlabels,themethodfallsintoSSL.Inthisway,SSLisequaltounsupervisedlearning.Therefore,generativeadversarialnetworks(GANs)[31]belongtoSSL.AnimportantconceptinthefieldofSSListheideaofpretext(alsoknownassurrogateorproxy)tasks.Theterm“pretext”meansthatthetaskbeingsolvedisnotthetruein-terestbutissolvedonlyforthegenuinepurposeofprovid-ingapromisingpretrainedmodel.Commonpretexttasksincluderotationpredictionandinstancediscrimination.Torealizedifferentpretexttasks,differentlossfunctionsareintroduced.AsthemostimportantconceptinSSL,wefirstintroducepretexttasksbelow.OLNOAUGUSTtimeorspace-Fig.4:SSL.ThisfigureisreproducedfromYannLeCun’skeynotetalkatICLR2020.Theredpartisknown,andtheotherpartisunknown.2.2PretexttasksInthissection,wesummarizethepretexttasksofSSL.ApopularSSLsolutionistoproposeapretexttaskfornetworkstosolve,andthenetworksaretrainedbylearningtheobjectivefunctionsofthesepretexttasks.Pretexttaskshavetwocommoncharacteristics,asfollows.First,featuresneedtobelearnedbydeeplearningmethodstosolvethethedatathemselves(self-supervision).Existingmethodsgenerallyutilizethreetypesofpre-ethodsCLandgenerativealgorithms.Here,generativealgorithmsgenerallymeanmaskedimagemodeling(MIM)methods.odsareusuallybasedonthecontextualrelationshipsamongthegivenexamples,suchastheirspa-tialstructuresandlocalandglobalconsistency.Now,weuserotationasasimpleexampletodemon-theconceptofcontextbasedpretexttasksThenwegraduallyintroduceothertasks.Theparadigmthatrotationfollowsinvolveslearningimagerepresentationsbytrainingneuralnetworks(NNs)torecog-nizethegeometrictransformationsappliedtotheoriginalimage.Foreachoriginalimage(see“0orotation”inFig.5),Gidarisetal.[33]createdthreerotatedimageswith90o,180o,and270orotations.Eachimagebelongedtooneoffourclasses,0o,90o,180o,or270orotation,whichweretheoutputlabelsgeneratedfromtheimagesthemselves.Morespecifically,thereisasetofK=4discretegeometrictrans-formationsG={g(.Iy)}whereg(.Iy)istheoperatorthatappliesageometrictransformationwithalabelofytoimageXtoproducethetransformedimageXy=g(XIy).Gidarisetal.usedadeepconvolutionalNN(CNN)F(.)topredictrotation;thisisafour-classcategorizationtask.TheCNNF(.)obtainsaninputimageXy*(wherey*isunknowntoF(.))andproducesaprobabilitydistributionoverallprobablegeometrictransformations:F╱Xy*Iθ、={Fy╱Xy*Iθ、},(1)whereFy╱Xy*Iθ、isthepredictedprobabilityforthege-ometrictransformationwithalabelofyandθdenotesthelearnableparametersofF(.).Intuitively,agoodCNNshouldbeabletocorrectlycategorizetheK=4classesofnaturalimages.Thus,givenasetofNtraininginstancesD={Xi},theself-supervisedtrainingobjectiveofF(.)isNi=1wherethelossfunctionloss(.)isni=1wherethelossfunctionloss(.)isKloss(Xi,θ)=_Llog(Fy(g(XiIy)Iθ)).(3)y=1In[34],therelativerotationanglewasconstrainedtobewithintherange[_30o,30o].TherotationswerebinnedintoColorization:Colorizationwasfirstproposedin[35],and[35]–[38]showedthatcolorizationcanbeapowerfulpretexttaskforSSL.Colorpredictionhastheadvantageouscharacteristicthatthetrainingdatacanbetotallyfree.TheLchannelofanycolorimagecanbeusedastheinputofanNNsystem,lsintheCIELabcolorutlightnesschannelXeRHxWx1,theobjectiveistopredicttheabcolorchannelsYeRHxWx2,whereHandWaretheheightandwidthdimensionality,respectively.WeuseYandtodenotethegroundtruthandthepredictedvalue,respectively.AnaturalobjectivefunctionminimizestheFrobeniusnormbetweenYand:L=_Y.(4)[35]usedthemultinomialcross-entropylossratherthan(4)predictitsabcolorchannels.Then,theLchannelandtheabcolorchannelscanbeconcatenatedtomaketheoriginalgrayscaleimagecolorful.Jigsaw:Thejigsawapproachusesjigsawpuzzlesasproxytasks.Itreliesontheintuitionthatanetworkaccomplishestheproxytasksbyunderstandingthecontextualinformationcontainedintheexamples.Morespecifically,itbreaksuppicturesintodiscretepatches,thenrandomlychangestheirpositionsandtriestorecovertheoriginalorder.[39]studiedtheeffectofscalingtwoself-supervisedmethods(jigsaw[40]–[43]andcolorization)alongthreedimensions:datasize,modelcapacity,andproblemcomplexity.Theresults[39]showedthattransferperformanceincreaseslog-linearlywiththedatasize.Therepresentationqualityalsoimproveswithhigher-capacitymodelsandincreasedproblemcom-plexity.Closelyrelatedworksto[40]include[44],[45].Thepretexttaskof[46],[47]wasaconditionalmotionprop-agationproblem.Noroozietal.[48]enforcedanadditionconstraintonthefeaturerepresentationprocess:thesumofthefeaturerepresentationsofallimagepatchesshouldOLNOAUGUSTorotationorotationFig.5:Rotation.Foreachoriginalimage(‘0orotation”),Gidarisetal.[33]createdthreerotatedimages:90o,180o,and270orotations.beapproximatelyequaltothefeaturerepresentationofthewholeimage.Manypretexttasksleadtorepresentationsthatarecovariantwithimagetransformations.[49]arguedthatsemanticrepresentationsshouldbeinvariantundersuchtransformations,andapretext-invariantrepresentationlearning(PIRL)approachthatlearnsinvariantrepresenta-ionsbasedonpretexttaskswasdeveloped2.2.2CLFollowingsimpleinstancediscriminationtasks[50]–[52],manyCL-basedSSLmethodssuchasmomentumcontrast(MoCo)v1[53],MoCov2[54],MoCov3[55],asimpleframeworkforCLofvisualrepresentations(SimCLR)v1dSimCLRvhaveemergedClassicalgorithmssuchasMoCohavepushedtheper-formanceofself-supervisedpretrainingtoalevelcompara-bletothatofsupervisedlearning,makingSSLrelevantforlarge-scaleapplicationsforthefirsttime.EarlyCLapproacheswereconstructedbasedontheideaofnegativeexamples.WiththedevelopmentofCL,anumberofCLmethodsthatdonotusenegativeexampleshaveemerged.Theyfollowdifferentideas,suchasself-distillationandfeaturedecorrelation,buttheyallobeytheideaofpositiveexampleconsistency.WedescribethedifferentavailableCLmethodsbelow.NegativeexamplesbasedCLfollowsasimilarpretexttask:instancediscrimination.Thebasicideaistomakepositiveexamplesclosetoeachotherandnegativeexamplesfarfromeachotherinthelatentspace.Theexactwayinwhichrdingtothegivenmodalityandotherfactors,whichcanincludespatialandtemporalconsistencyinvideounderstandingorthecooccurrencebetweenmodalitiesinmulti-modallearn-ing.MoCo.Heetal.[53]viewedCLasadictionarylook-uptask.ConsideranencodedqueryqandseveralencodedAssumethatasinglekey(denotedask+)inthedictionarymatchesq.Acontrastiveloss[58]isafunctionwhosevalueislowifqissimilartoitspositivekeyk+anddissimilartoallothernegativekeys.Withsimilaritymeasuredbythedotproduct,onecontrastivelossfunctionformcalledInfoNCE[59]wasconsideredinMoCov1[53]:Lq=_log(5) exp(q.Lq=_log(5)Lexp(q.ki/τ),whereτdenotesthetemperaturehyperparameter.ThesumiscalculatedoveronepositiveexampleandKnegativeexamples.InfoNCEwasderivedfromnoisecontrastivees-timation(NCE)[60],whoseobjectiveisexp(q.k+/τ)+exp(q.k_/τ),exp(q.k+/τ)+exp(q.k_/τ),whereqissimilartoapositiveexamplek+anddissimilartoanegativeexamplek_.BasedonMoCov1[53]andSimCLRv1[56],MoCov2[54]usesanmultilayerperceptron(MLP)projectionheadandmoredataaugmentations.hofNinstancesanddefinesacontrastivepredictiontaskonpairsofaugmentedinstancesfromtheminibatch,producing2Ninstances.SimCLRv1doesnotexplicitlysampleneg-ativeinstances.Instead,givenapositivepair,SimCLRv1treatstheother2(N_1)augmentedinstancesinthemini-batchasnegativeinstances.Letsim(u,v)=uTv\(|u||v|)bethecosinesimilaritybetweentwoinstancesuandv.Then,thelossfunctionofSimCLRv1forapositivepairofinstances(i,j)isL1[ki]exp(sim(zi,zk)/τ), exp(simL1[ki]exp(sim(zi,zk)/τ),where1[ki]e{0,1}isanindicatorfunctionequalto1iffkiandτisthetemperaturehyperparameter.Thefinallossiscomputedbyallpositivepairs,both(i,j)and(j,i),inthemini-batch.BothMoCoandSimCLRrequiredataaugmentationtechniquessuchascropping,resizing,andcolordistor-tion.Otheraugmentationmethodsareavailable[61].Forexample,[62]estimatedtheforegroundsaliencylevelsinimagesandcreatedaugmentationsbycopyingandpastingtheimageforegroundsontodifferentbackgrounds,suchashomogeneousgrayscaleimageswithrandomgrayscalelevels,textureimages,andImageNetimages.However,whyaugmentationhelpsandhowwecanperformmoreeffectiveaugmentationsarestillunclearandrequirefurtherstudies.CLmethodsbasedonself-distillation:Bootstrapyourownlatent(BYOL)[63]isarepresentativeself-distillationalgorithm.BYOLwasproposedforself-supervisedimagerepresentationlearningwithoutusingnegativepairs.BYOLusestwoNNs,whicharecalledonlineandtargetnetworks.SimilartoMoCo[53],BYOLupdatesthetargetnetworkwithaslow-movingaverageoftheonlinenetwork.JOURNALOFLATEXCLASSFILES,VOL.gradsimilarity&graddissimilarityshareweights encoder.....................encodersimilaritygrsimilaritypredictormovingaverageencodermovingaverageencoderencoderimageencoderSimCLRgradsimilarigradencoderencodergrad predictorencoderencoderimageBYOLsimilarity encoderimageSwAVimageSimSiamisreproducedfrom[65].SiamesenetworkssuchasSimCLR,BYOL,andSwAV[64]havebecomecommonstructuresinvariousrecentlydevelopedmodelsforself-supervisedvisualrepresentationlearning.Thesemodelsmaximizethesimilaritybetweentwoaugmentationsofoneimage;theyaresubjecttocertainconditionstopreventcollapsingsolutions.[65]proposedsimpleSiamese(SimSiam)networksthatcanlearnusefulrepresentationswithoutusingthefollow-ing:negativesamplepairs,largebatches,andmomentumencoders.Foreachdatapointx,wehavetworandomlyaugmentedviewsx1andx2.AnencoderfandanMLPpredictionheadhareusedtoprocessthetwoviewsDenot-ingthetwooutputsbyp1=h(f(x1))andz2=f(x2),[65]minimizedtheirnegativecosinesimilarity|p1|2|z2|2,D(p1,z2)|p1|2|z2|2,where||2isthel2-norm.Similarto[63],[65]definedaclossasL=(D(p1,z2)+D(p2,z1)),(9)wherethislossisdefinedbasedontheexamplexandthetotallossistheaverageofallexamples.Moreimportantly,[65]usedastop-gradient(stopgrad)operationbyrevisingasfollows:Dpstopgradz(10)whichmeansthatz2isseenasaconstant.Analogously,(9)L=(D(p1,stopgrad(z2))+D(p2,stopgrad(z1))).(11)SiamareshowninFig.6.SinceBYOLandSimSiamdonotusenegativeexamples,whethertheybelongtoCLisbelongtoCLinthispaper.CLmethodsbasedonfeaturedecorrelation:Featuredecorrelationaimstolearndecorrelatedfeatures.Barlowtwins.Barlowtwins[67]wereproposedwithanovellossfunction;theymaketheembeddingvectorsofdistortedversionsofanexamplesimilarwhileminimizingtheredundancybetweenthecomponentsofthesevectors.Morespecifically,similartootherSSLmethods[53],[56],BarlowtwinsproducetwoviewsforallimagesofabatchXsampledfromadatabaseandfinallyproducebatchesofem-beddingsZAandZB,respectively.TheobjectivefunctionofBarlowtwinsisLBT=L(1_Cii)2+λLLC,(12)iijiationmatrixcomputedbetweentheoutputsoftwoequivalentnetworksalongthebatchdimension:Cij=(13) LbzCij=(13)│Lb╱zi、2│Lb╱zj、2,wherebisthebatchexampleindexandi,jisthevectordimensionindexofthenetworkoutputs.Cisasquarema-trixwithasizeequaltothedimensionalityofthenetworkoutput.lartoBarlowtwins[67],variance-invariance-covariancereg-twinsconsideracross-correlationmatrix,whileVICRegconsidersvariance,invariance,andcovariance.Letd,n,andzdenotethedimensionalityofthevectorsinZA,thebatchsize,andthevectorconsistingofeveryvaluewithdimensionalityjamongallexamplesofZA,respectively.ThevarianceregularizationtermvofVICRegisdefinedasahingelossfunctiononthestandarddeviationoftheembeddingsalongthebatchdimension:v╱ZA、=max(0,γ_S╱z,ε、),(14)whereSistheregularizedstandarddeviation,whichisdenedasS(y,ε)=│Var(y)+ε,(15)whereγisaconstantforthestandarddeviation,whichissetto1intheexperiments,andεisasmallscalarforpreventingnumericalinstabilities.Thiscriterionencouragesthanγforeverydimension,preventingcollapseincaseswherealldataaremappedtothesamevector.TheinvariancecriterionsofVICRegbetweenZAandZBisdefinedasthemean-squaredEuclideandistancebe-tweeneachpair

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论