大数据如何帮助小数据-FeiSha南加州大学副教授_第1页
大数据如何帮助小数据-FeiSha南加州大学副教授_第2页
大数据如何帮助小数据-FeiSha南加州大学副教授_第3页
大数据如何帮助小数据-FeiSha南加州大学副教授_第4页
大数据如何帮助小数据-FeiSha南加州大学副教授_第5页
已阅读5页,还剩118页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Howbigdatacanhelpsmalldata?DepartmentofComputerScienceUniversityofSouthernCaliforniaEdgecasestudy“常常会遇到偶发的⼤QuanFinance“左尾”/“右尾”没有⾜够多的数据TimescalesShort-termWhichschool?TimescalesShort-termWhichschool?Extracurricularactivities:sports,arts,etc?Mid-termWhichuniversity?Whichspeciality?Long-termWhatkindofcareerpath?Whatelsemakesthishard?IndividualdifferencesGeneticallyEnvironmentallyHarshconstraintsOne-shotgameCostlytorecoverfrommistakes[Bill&MelindaGatesFoundationonPersonalizedLearning]CNNonPersonalizedLearning6]CanyoubuildamodelofSadly,Sadly,not100%yetIndividualizedmodelsneedindividual-specificdataTheamountofdataisfundamentallylimited,hencebeingSmall.MostmodernlearningalgorithmsrequireBigDataabouttheindividual.Sadly,not100%yetIndividualizedmodelsneedindividual-specifiSadly,not100%yetIndividualizedmodelsneedindividual-specificdataTheamountofdataisfundamentallylimited,hencebeingSmall.MostmodernlearningalgorithmsrequireBigDataabouttheindividual.you.CanyoubuildamodelofSadly,Sadly,not100%yetIndividualizedmodelsneedindividual-specificdataTheamountofdataisfundamentallylimited,hencebeingSmall.MostmodernlearningalgorithmsrequireBigDataabouttheindividual.you.rI3LearningsettingsMulti-tasklearningDomainadaptationZero-shotlearningPrimaryapplicationfocusComputervision3LearningsettingsMulti-tasklearningDomainadaptationZero-shotlearningPrimaryapplicationfocusComputervision3LearningsettingsMulti-tasklearningDomainadaptationZero-shotlearningPrimaryapplicationfocusComputervision3LearningsettingsMulti-tasklearningDomainadaptationZero-shotlearningPrimaryapplicationfocusComputervision3LearningsettingsMulti-tasklearningDomainadaptationZero-shotlearningPrimaryapplicationfocusComputervisionVignette1Multi-taskLearning(MTL)“众人拾柴高”w1w2w3w4w1w2w3w4ProblemsettingMtasks,eachwithitsowndataNeedtofindsolutionsforallofthemTraditionalframeworkforsupervisedlearningSolveeachtaskindependentlyargmin`(Dm;wm)+λmR(wm)wmw1w2w1w2w3w4MainideaLearnjointlymultiplerelatedtasksForceknowledgesharingCombinesmalldataintobigdataBenefitsImprovegeneralizationperformanceRequirelessamountofdataWorksinbothdeepandshallowlearningmodelsw1,w2,···,wMMX`(Dm;wm)+λR(w1,w2,···,wM)m=1eeetalArgyriouetal08,Daumé,09..…]MX`(Dm;wm)+m=1MX`(Dm;wm)+m=1w1,w2,···,wMExploitingtaskrelatednessEncodepriorknowledgebyselectingtheregularizerConstrainthehypothesisspaceforalltasksChoicesofregularizerAllparametersaresimilartoeachotherParametersshouldhavesimilarsparsitypatterns.λλR(w1,w2,···,wM)ww1ww2ww3ww4D23InputvisualfeatureD[objectcategoriesandattributes,CVPR,2011]haredfeaturesxxxxD23InputvisualfeatureD[objectcategoriesandattributes,CVPR,2011]haredfeaturesxxxxwhitespotspolarbearwhitespotsject yaAAttributesclassifier classject yaAAttributesclassifier 911912 u1911912 u1u2u3MA11VisualfeaturespaceAnalogiesleopard:cat=wolf:dogleopard:tiger=horse:zebraRRegularization--------==SemanticEmbeddingSpace[Analogy-preservingembedding,ICML,2013]NBSharingontologiesNIPS2)]NotalltasksarebeneficialNotalltasksarebeneficialNotalltasksarebeneficialNotalltasksarebeneficialw1Howtodiscovergroupsofrelatedsubtasks?“Learningwithwhomtow1Howtodiscovergroupsofrelatedsubtasks?“Learningwithwhomtoshare”(ICML,2011)Group1w2“Resistingthetemptationtoshare”(CVPR,2014)Whythisisuseful?w3LearninginnoisytaskdataLearningfromasetofirrelevanttasksEx:compbio,noisylabelsGroup2w4NotalltasksVignette2DomainadaptationClassificationtask:givenafaceimage,determinemanorwoman?CollectalotoflabeledimagestrainingtaanwomanxxxxInferaclassificationboundary22 xxClassifyontestimagex2xxxClassifyontestimagex2xccessSharedstatisticalproperties,usefulforclassificationSharedstatisticalproperties,usefulforclassificationtell-talefeature:lengthofhairtrainingdatatestdataMismatchbetweentrainingandtestingtrainingdatatestdataunseendataMismatchbetweentrainingandtesting“lengthofhair”nolongerefective!trainingdatatestdataunseendataUnrealistic,oversimplifyingassumptionsLearningenvironmentisstationaryTraining,testingandfuturedataaresampledini.i.dfromthesamedistributionWorkswellinacademic/well-controlledsettings.Inreal-life,Learningenvironmentchanges.Training,testingandfuturedataaresampledfromdifferentdistributions.Wesufferfrompoorcross-distributiongeneralization,whereaccuracyfordisparatedomainsdropssignificantly.ComputervisionObjectrecognition:train&testondifferentdatasetsVehiclepedestrianavoidancesystems:train&testindifferentvehicular/cityenvironmentsNaturallanguageprocessingSyntacticparsing:trainonbusinessarticlesbutappliedtomedicaljournalsSpeechrecognition:trainonnativespeakersbutappliedtoaccentedvoicesChallengesManyexogenousfactorsaffectvisualappearances:pose,illumination,camera’squality,etc.Collectingdataunderallpossiblecombinationsofthosefactorsisexpensive.Labelingthosedataisevenmorecostly.CaltechCaltech-256mAmazonDSLRExampleimagesfrom4domainsinourempiricalstudiesAccuracyAccuracy[Anonymoussource,2014]EffectofusingbiggerdatasetsforadaptationlargersourceAmazonWebcamImageNetAdaptedAmazonAdaptedImageNetHowtoadapt?linearsubspacesDomain-invariantfeaturesTheoreticalmotivationExploitintrinsicstructuresLearnkernelsdiscriminativeclusteringGrasGrassmannmanifoldofsubspacesSourcedomainGeodesicflowcapturesdomain-invariantrepresentation(forvisualrecognition)Targetdomain(ICML13,NIPS13)[Ben-Davidetal’06,Blitzeretal’06,DaumeIII’07,Panetal,09,SharedrepresentationExistenceofa(latent)featurespaceThemarginalsofsourceandtargetsarethesame(orsimilar)inthisspaceExistasingleclassifierworkswellonbothdomainseT[h]<eS[h]+A(PS,PT)+infh2H[eT[h]+eS[h]]howwellahowwellasingleclassifiercandodistributionsaresimilarGrassmannmanifoldofsubspacesTargetdomainGeodesicflowcapturesGrassmannmanifoldofsubspacesTargetdomainGeodesicflowcapturesdomain-invariantSourcedomainrepresentation(forvisualrecognition)PRDomain-invariantfeaturesParameterizedaslinearkernelmappingoforiginalfeaturesConstructedtominimizediscrepancybetweentwodomainsModeldomainswithsubspacesComputediscrepancyasdifferencesbetweensubspacesGG(d,D)Noadaptation SGF(Gopalanetal,ICCV2011)GeodesicFlowkernel(ours)DAC45004500Geodesicflowkernel(GFK)LandmarkC-->AA-->WW-->CD-->AC-->DA-->CVignette3Zero-shotlearningClassicalmachinelearningframeworkMultiwayclassificationLabelingspaceisdeterminedapriorAlargenumberofannotatedtrainingsamplesforeveryclassChallengesforrecognitioninthewildLabelingspacegrowsarbitrarilylargewithemergenceofnewclassesCollectingdatafornewclassesisnotalwayscost-effectiveSomeclassesdonothaveenoughlabeledorzerolabeledimages“cat”“flower”“bench”“dog”“bear”“bird”Numberofspecies(total:1,589,361)Birds:9956Fish:30,000Mammals:5,416Reptiles:8,240Insects:950,000Corals:2,175Plants:297,326Mushrooms:16,000“Skywalker”gibbonObjectsSimilarly,inImageNetTwotypesofclassesSeen:withalotoflabeledexamplesUnseen:withoutanyexamplesCatHorse ?FiguresfromDerekHoiem’sslidesWhatisit:bear-like,withblackandwhitestripeandoftenwithbamboo?ClasslabelsClasslabels≠discretenumbersNeedtoassignsemanticmeaningstoclasslabelsNeedtodefinerelationshipsamongclasslabelsKeyassumptionsThereisacommonsemanticspacesharedbybothtypesofclassesConfigurationoftheembeddingsenable“transfer”.seeseenclassuneenclassSemanticEmbeddings•Attributes(Farhadietal.09,Lampertetal.09,Parikh&Grauman11,…)•Wordvectors(Mikolovetal.13,Socheretal.13,Fromeetal.13,…)•Word•Wordvectors(Mikolovetal.13,Socheretal.13,Fromeetal.13,…)SemanticEmbeddings•Attributes(Farhadietal.09,Lampertetal.09,Parikh&Grauman11,…)ngSeenObjectsnObjectSeenObjectsnObjectBrownMuscularHasSnoutHasMane(likehorse)HasSnout(likedog)HowHowtoeffectivelyconstructamodelforzebra?FiguresfromDerekHoiem’sslidesTrainingSeenclassesandtheirsemanticembeddingsS={1,2,···,S}AS={a1,a2,···,aS}AnnotatedtrainingsamplesD={(xn,yn)}=1GoalUnseenclassesandtheirsemanticembeddings八={S+1,···,S+U}AU={aS+1,aS+2,···,aS+U}Classifier:f:x!y2八ardaCardinal2v2w3311ModelspacewvvwardaCardinal2v2w3311Modelspacewvvwb1a1b2SemanticspaceSemanticspaceb3aSynthesizedclassifiersforzero-shotlearningSemanticrepresentationsSemanticembeddingspaceVisualfeaturesaGadwallaCedarWaxwinga(·)=PCAaHouseWren((au)forNNclassificationortoimproveexistingZSLapproaches:classexemplarcat01.11b1a1penguin−.2cat01.11b1a1penguin−.2Modelspace2−1.0(0.4A(−0.3Av1v2b2a3 Semanticspace3abBBC(−0.4AIntroducephantomclassesasbasesLearnbases’semanticembeddingsaswellasmodelsforbasesGraphsstructuresencode“relatedness”DefinehowclassesarerelatedinthesemanticembeddingspaceDefinehowclassesarerelatedinthemodelspaceDatasetsDatasetsTotal#AwA†CUB‡ClassificationaccuracyAwACUBSUNImageNet

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论