专业英语_翻译作业_1_第1页
专业英语_翻译作业_1_第2页
专业英语_翻译作业_1_第3页
专业英语_翻译作业_1_第4页
专业英语_翻译作业_1_第5页
已阅读5页,还剩17页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、DeepFeaturesforTextSpottingMaxJaderberg,AndreaVedaldi,AndrewZissermanVisualGeometryGroup,DepartmentofEngineeringScience,UniversityofOxfordAbstract.Thegoalofthisworkistextspottinginnaturalimages.Thisisdividedintotwosequentialtasks:detectingwordsregionsintheimage,andrecognizingthewordswithintheseregio

2、ns.Wemakethefollowingcontributions:rst,wedevelopaConvolutionalNeuralNet-work(CNN)classierthatcanbeusedforbothtasks.TheCNNhasanovelarchitecturethatenablesecientfeaturesharing(byusinganumberoflayersincommon)fortextdetection,charactercase-sensitiveandinsensitiveclassication,andbigramclassication.Itexce

3、edsthestate-of-the-artperformanceforallofthese.Second,wemakeanumberoftechnicalchangesoverthetraditionalCNNarchitectures,includingnodownsamplingforaper-pixelslidingwindow,andmulti-modelearn-ingwithamixtureoflinearmodels(maxout).Third,wehaveamethodofautomateddataminingofFlickr,thatgenerateswordandchar

4、acterlevelannotations.Finally,thesecomponentsareusedtogethertoformanend-to-end,state-of-the-arttextspottingsystem.Weevaluatethetext-spottingsystemontwostandardbenchmarks,theICDARRobustReadingdatasetandtheStreetViewTextdataset,anddemonstrateimprovementsoverthestate-of-the-artonmultiplemeasures.1Intro

5、ductionWhiletextrecognitionfromscanneddocumentsiswellstudiedandtherearemanyavailablesystems,theautomaticdetectionandrecognitionoftextwithinimagestextspotting(Fig.1)isfarlessdeveloped.However,textcontainedwithinimagescanbeofgreatsemanticvalue,andsoisanimportantstepto-wardsbothinformationretrievalanda

6、utonomoussystems.Forexample,textspottingofnumbersinstreetviewdataallowstheautomaticlocalizationofhousesnumbersinmaps20,readingstreetandshopsignsgivesroboticve-hiclesscenecontext39,andindexinglargevolumesofvideodatawithtextobtainedbytextspottingenablesfastandaccurateretrievalofvideodatafromatextsearc

7、h26.2MaxJaderberg,AndreaVedaldi,AndrewZisserman(a)(b)Fig.1.(a)Anend-to-endtextspottingresultfromthepresentedsystemontheSVTdataset.(b)RandomlysampledcroppedworddataautomaticallyminedfromFlickrwithaweakbaselinesystem,generatingextratrainingdata.pipeline.ToachievethisweuseaConvolutionalNeuralNetwork(CN

8、N)27andgenerateaper-pixeltext/no-textsaliencymap,acase-sensitiveandcase-insensitivecharactersaliencymap,andabigramsaliencymap.Thetextsaliencymapdrivestheproposalofwordboundingboxes,whilethecharacterandbigramsaliencymapsassistinrecognizingthewordwithineachboundingboxthroughacombinationofsoftcosts.Our

9、workisinspiredbytheexcellentperformanceofCNNsforcharacterclassication6,8,47.Ourcontributionsarethreefold:First,weintroduceamethodtosharefeatures44whichallowsustoextendourcharacterclassierstoothertaskssuchascharacterdetectionandbigramclassicationataverysmallextracost:werstgenerateasinglerichfeaturese

10、t,bytrainingastronglysupervisedcharacterclassier,andthenusetheinter-mediatehiddenlayersasfeaturesforthetextdetection,charactercase-sensitiveandinsensitiveclassication,andbigramclassication.Thisproceduremakesbestuseoftheavailabletrainingdata:plentifulforcharacter/non-characterbutlesssofortheothertask

11、s.ItisreminiscentoftheCaeidea14,buthereitisnotnecessarytohaveexternalsourcesoftrainingdata.AsecondkeynoveltyinthecontextoftextdetectionistoleveragetheconvolutionalstructureoftheCNNtoprocesstheentireimageinonegoinsteadofrunningCNNclassiersoneachcroppedcharacterproposal27.Thisallowsustogenerateecientl

12、y,inasinglepass,allthefeaturesrequiredtodetectwordboundingboxes,andthatweuseforrecognizingwordsfromaxedlexiconusingtheViterbialgorithm.WealsomakeatechnicalcontributioninshowingthatourCNNarchitectureusingmaxout21asthenon-linearactivationfunctionhassuperiorperformancetothemorestandardrectiedlinearunit

13、.Ourthirdcontributionisamethodforautomaticallyminingandannotatingdata(Fig.1).SinceCNNscanhavemanymillionsoftrainableparameters,werequirealargecorpusoftrainingdatatominimizeovertting,andminingisuse-fultocheaplyextendavailabledata.OurminingmethodcrawlsimagesfromtheInternettoautomaticallygeneratewordle

14、velandcharacterlevelboundingboxannotations,andaseparatemethodisusedtoautomaticallygeneratecharacterlevelboundingboxannotationswhenonlywordlevelboundingboxannotationsaresupplied.DeepFeaturesforTextSpotting3Inthefollowingwerstdescribethedataminingprocedure(Sect.2)andthentheCNNarchitectureandtraining(S

15、ect.3).Ourend-to-end(imagein,textout)textspottingpipelineisdescribedinSect.4.Finally,Sect.5evaluatesthemethodonanumberofstandardbenchmarks.Weshowthattheperformanceexceedsthestateoftheartacrossmultiplemeasures.RelatedWork.Decomposingthetext-spottingproblemintotextdetectionandtextrecognitionwasrstprop

16、osedby12.Authorshavesubsequentlyfocusedsolelyontextdetection7,11,16,50,51,ortextrecognition31,36,41,oroncombiningbothinend-to-endsystems40,39,49,3234,45,35,6,8,48.Textdetectionmethodsareeitherbasedonconnectedcomponents(CCs)11,16,50,49,3235orslidingwindows40,7,39,45.Connectedcomponentmeth-odssegmentp

17、ixelsintocharacters,thengrouptheseintowords.Forexample,Epshteinetal.takecharactersasCCsofthestrokewidthtransform16,whileNeumannandMatas34,33useExtremalRegions29,ormorerecentlyorientedstrokes35,asCCsrepresentingcharacters.Slidingwindowmethodsapproachtextspottingasastandardtaskofobjectdetection.Forexa

18、mple,Wangetal.45usearandomferns38slidingwindowclassiertondcharactersinanimage,groupingthemusingapictorialstructuresmodel18foraxedlexicon.Wang&Wuetal.47buildonthexedlexiconproblembyusingCNNs27withunsupervisedpre-trainingasin13.Alsharifetal.6andBissaccoetal.8,alsouseCNNsforcharacterclassicationbot

19、hmethodsover-segmentawordboundingboxandndanapproximatesolutiontotheoptimalwordrecognitionresult,in8usingbeamsearchandin6usingaHiddenMarkovModel.TheworksbyMishraetal.31andNovikovaetal.36focuspurelyontextrecognitionassumingaperfecttextdetectorhasproducedcroppedimagesofwords.In36,Novikovacombinesbothvi

20、sualandlexiconconsistencyintoasingleprobabilisticmodel.2DataminingforwordandcharacterannotationsInthissectionwedescribeamethodforautomaticallyminingsuitablephotosharingwebsitestoacquirewordandcharacterlevelannotateddata.Thisan-notationisusedtoprovideadditionaltrainingdatafortheCNNinSect.5.WordMining

21、.PhotosharingwebsitessuchasFlickr3containalargerangeofscenes,includingthosecontainingtext.Inparticular,the“TypographyandLettering”grouponFlickr4containsmainlyphotosorgraphicscontainingtext.Asthetextdepictedinthescenesarethefocusoftheimages,theusergiventitlesoftheimagesoftenincludethetextinthescene.C

22、apitalizingonthisweaklysupervisedinformation,wedevelopasystemtondtitletextwithintheimage,automaticallygeneratingwordandcharacterlevelboundingboxannotations.Usingaweakbaselinetext-spottingsystembasedontheStrokeWidthTrans-form(SWT)16anddescribedinSect.5,wegeneratecandidateworddetections4MaxJaderberg,A

23、ndreaVedaldi,AndrewZissermanforeachimagefromFlickr.Ifadetectedwordisthesameasanyoftheimagestitletextwords,andtherearethesamenumberofcharactersfromtheSWTdetectionphaseaswordcharacters,wesaythatthisisanaccurateworddetec-tion,andusethisdetectionaspositivetexttrainingdata.Wesettheparameterssothatthereca

24、llofthisprocessisverylow(outof130000images,only15000wordswerefound),buttheprecisionisgreaterthan99%.Thismeansthepre-cisionishighenoughfortheminedFlickrdatatobeusedaspositivetrainingdata,buttherecallistoolowforittobeusedforbackgroundno-texttrainingdata.WewillrefertothisdatasetasFlickrType,whichcontai

25、ns6792images,14920words,and71579characters.Fig.1showssomepositivecroppedwordsrandomlysampledfromtheautomaticallygeneratedFlickrTypedataset.Althoughthisprocedurewillcauseabiastowardsscenetextthatcanbefoundwithasimpleend-to-endpipeline,itstillgeneratesmoretrainingexamplesthatcanbeusedtopreventtheovert

26、tingofourmodels.AutomaticCharacterAnnotation.InadditiontominingdatafromFlickr,wealsousethewordrecognitionsystemdescribedinSect.4.2toautomaticallygeneratecharacterboundingboxannotationsfordatasetswhichonlyhavewordlevelboundingboxannotations.Foreachcroppedword,weperformtheoptimalttingofthegroundtrutht

27、exttothecharactermapusingthemethoddescribedinSect.4.2.Thisplacesinter-characterbreakpointswithimpliedcharactercen-ters,whichcanbeusedasroughcharacterboundingboxes.WedothisfortheSVTandOxfordCornmarketdatasets(thataredescribedinsection5),allowingustotrainandtestonanextra22,000croppedcharactersfromthos

28、edatasets.3FeaturelearningusingaConvolutionalNeuralNetworkTheworkhorseofatext-spottingsystemisthecharacterclassier.Theoutputofthisclassierisusedtorecognizewordsand,inoursystem,todetectim-ageregionsthatcontaintext.Text-spottingsystemsappeartobeparticularlysensitivetotheperformanceofcharacterclassicat

29、ion;forexample,in8in-creasingtheaccuracyofthecharacterclassierby7%ledtoa25%increaseinwordrecognition.Inthissectionwethereforeconcentrateonmaximizingtheperformanceofthiscomponent.Toclassifyanimagepatchxinoneofthepossiblecharacters(orbackground),weextractasetoffeatures(x)=(1(x),2(x),.,K(x)andthenlearn

30、abi-naryclassierfcforeachcharactercofthealphabetC.Classiersarelearnedtoyieldaposteriorprobabilitydistributionp(c|x)=fc(x)overcharactersandthelatterismaximizedtorecognizethecharacterc¯containedinpatchx:c¯=argmaxcCp(c|x).Traditionally,featuresaremanuallyengineeredandop-timizedthroughalaborio

31、ustrial-and-errorcycleinvolvingadjustingthefeaturesandre-learningtheclassiers.Inthiswork,weproposeinsteadtolearntherep-resentationusingaCNN27,jointlyoptimizingtheperformanceofthefeaturesaswellasoftheclassiers.Asnotedintherecentliterature,awelldesignedDeepFeaturesforTextSpotting5learnablerepresentati

32、onofthistypecaninfactyieldsubstantialperformancegains25.CNNsareobtainedbystackingmultiplelayersoffeatures.AconvolutionallayerconsistofKlinearltersfollowedbyanon-linearresponsefunction.Theinputtoaconvolutionallayerisafeaturemapzi(u,v)where(u,v)iarespatialcoordinatesandzi(u,v)RCcontainsCscalarfeatures

33、orchannelsckzi(u,v).Theoutputisanewfeaturemapzi+1suchthatzi+1=hi(Wikzi+bik),whereWikandbikdenotethek-thlterkernelandbiasrespectively,andhiisanon-linearactivationfunctionsuchastheRectiedLinearUnit(ReLU)hi(z)=max0,z.Convolutionallayerscanbeintertwinedwithnormalization,subsampling,andmax-poolinglayersw

34、hichbuildtranslationinvarianceinlocalneighborhoods.Theprocessstartswithz1=xandendsbyconnectingthelastfeaturemaptoalogisticregressorforclassication.AlltheparametersofthemodelarejointlyoptimizedtominimizetheclassicationlossoveratrainingsetusingStochasticGradientDescent(SGD),back-propagation,andotherim

35、provementsdiscussedinSect.3.1.InsteadofusingReLUsasactivationfunctionhi,inourexperimentsitwasfoundempiricallythatmaxout21yieldssuperiorperformance.Maxout,inpar-ticularwhenusedinthenalclassicationlayer,canbethoughtofastakingthemaximumresponseoveramixtureofnlinearmodels,allowingtheCNNtoeasily212ziissi

36、mplytheirpointwisemaximum:hi(zi(u,v)=maxzi(u,v),zi(u,v).Moregenerally,thek -thmaxoutoperatorhkisobtainedbyselectingasub-setGk i1,2,.,Koffeaturechannelsandcomputingthemaximumover kthem:hki(zi(u,v)=maxkGk izi(u,v).Whiledierentgroupingstrategiesarepossible,heregroupsareformedbytakinggconsecutivechannel

37、softheinputmap:G1i=1,2,.,g,G2i=g+1,g+2,.,2gandsoon.Hence,givenKfeaturechannelsasinput,maxoutconstructsK =K/gnewchannels.Thissectiondiscussesthedetailsoflearningthecharacterclassiers.Trainingisdividedintotwostages.Intherststage,acase-insensitiveCNNcharacterclassierislearned.Inthesecondstage,theresult

38、ingfeaturemapsareappliedtootherclassicationproblemsasneeded.Theoutputisfourstate-of-the-artCNNclassiers:acharacter/backgroundclassier,acase-insensitivecharacterclassier,acase-sensitivecharacterclassier,andabigramclassier.Stage1:Bootstrappingthecase-insensitiveclassier.Thecase-insensitiveclassieruses

39、afour-layerCNNoutputtingaprobabilityp(c|x)overanalpha-betCincludingall26letters,10digits,andanoise/background(no-text)class,givingatotalof37classes(Fig.2)Theinputz1=xoftheCNNaregrayscalecroppedcharacterimagesof24×24pixels,zero-centeredandnormalizedbysubtractingthepatchmeananddividingbythestanda

40、rddeviation.Duetothesmallinputsize,nospatialpoolingordownsamplingisperformed.Startingfromtherstlayer,theinputimageisconvolvedwith96ltersofsize6MaxJaderberg,AndreaVedaldi,AndrewZissermangroups.Fig.3.Visualizationsofeachcharacterclasslearntfromthe37-waycase-insensitivecharacterclassierCNN.Eachimageiss

41、yntheticallygeneratedbymaximizingtheposteriorprobabilityofaparticularclass.Thisisimplementedbyback-propagatingtheerrorfromacostlayerthataimstomaximizethescoreofthatclass43,17.9×9,resultinginamapofsize16×16(toavoidboundaryeects)and96channels.The96channelsarethenpooledwithmaxoutingroupofsize

42、g=2,resultingin48channels.Thesequencecontinuesbyconvolvingwith128,512,148ltersofside9,8,1andmaxoutgroupsofsizeg=2,4,4,resultinginfeaturemapswith64,128,37channelsandsize8×8,1×1,1×1respectively.Thelast37channelsarefedintoasoft-maxtoconvertthemintocharacterprobabilities.Inpracticeweuse48

43、channelsinthenalclassicationlayerratherthan37asthesoftwareweuse,basedoncuda-convnet25,isoptimizedformultiplesof16convolutionallterswedohoweverusetheadditional12classesasextrano-textclasses,abstractingthisto37outputclasses.Wetrainusingstochasticgradientdescentandback-propagation,andalsousedropout22in

44、alllayersexcepttherstconvolutionallayertohelppreventovertting.Dropoutsimplyinvolvesrandomlyzeroingaproportionofthepa-rameters;theproportionwekeepforeachlayeris1,0.5,0.5,0.5.Thetrainingdataisaugmentedbyrandomrotationsandnoiseinjection.Byomittinganydownsamplinginournetworkandensuringtheoutputforeachcl

45、assisonepixelinsize,itisimmediatetoapplythelearntltersonafullimageinaconvolu-tionalmannertoobtainaper-pixeloutputwithoutalossofresolution,asshownDeepFeaturesforTextSpotting7inthesecondimageofFig4.Fig.3illustratesthelearnedCNNbyusingthevisualizationtechniqueof43.Stage2:Learningtheothercharacterclassi

46、ers.Trainingonalargeamountofannotateddata,andalsoincludingano-textclassinouralphabet,meansthehiddenlayersofthenetworkproducefeaturemapshighlyadeptatdiscriminatingcharacters,andcanbeadaptedforotherclassicationtasksre-latedtotext.Weusetheoutputsofthesecondconvolutionallayerasoursetofdiscriminativefeat

47、ures,(x)=z2.Fromthesefeatures,wetraina2-waytext/no-textclassier1,a63-waycase-sensitivecharacterclassier,andabi-gramclassier,eachoneusingatwo-layerCNNactingon(x)(Fig.2).ThelasttwolayersofeachofthesethreeCNNsresultinfeaturemapswith128-2,128-63,and128-604channelsrespectively,allresultingfrommaxoutgroup

48、ingofsizeg=4.Thesearealltrainedwith(x)asinput,withdropoutof0.5onalllayers,andne-tunedbyadaptivelyreducingthelearningrate.Thebigramclassierrecognisesinstancesoftwoadjacentcharacters,e.g.Fig6.TheseCNNscouldhavebeenlearnedindependently.However,sharingthersttwolayershastwokeyadvantages.First,thelow-leve

49、lfeatureslearnedfromcase-insensitivecharacterclassicationallowssharingtrainingdataamongtasks,reducingoverttingandimprovingperformanceinclassicationtaskswithlessinformativelabels(text/no-textclassication),ortaskswithfewertrainingexamples(case-sensitivecharacterclassication,bigramclassication).Second,

50、itallowssharingcomputations,signicantlyincreasingtheeciency.4End-to-EndPipelineThissectiondescribesthevariousstagesoftheproposedend-to-endtextspot-tingsystem,makinguseofthefeatureslearntinSect.3.Thepipelinestartswithadetectionphase(Sect.4.1)thattakesarawimageandgeneratescandidateboundingboxesofwords

51、,makinguseofthetext/no-textclassifer.Thewordscontainedwithintheseboundingboxesarethenrecognizedagainstaxedlex-iconofwords(Sect.4.2),drivenbythecharacterclassiers,bigramclassier,andothergeometriccues.Theaimofthedetectionphaseistostartfromalarge,rawpixelinputimageandgenerateasetofrectangularboundingbo

52、xes,eachofwhichshouldcontaintheimageofaword.Thisdetectionprocess(Fig.4)istunedforhighrecall,andgeneratesasetofcandidatewordboundingboxes.Theprocessstartsbycomputingatextsaliencymapbyevaluatingthecharacter/backgroundCNNclassierinaslidingwindowfashionacrosstheim-age,whichhasbeenappropriatelyzero-padde

53、dsothattheresultingtextsaliency1Trainingadedicatedclassierwasfoundtoyieldsuperiorperformancetousingthebackgroundclassinthe37-waycase-sensitivecharacterclassier.8MaxJaderberg,AndreaVedaldi,AndrewZissermanFig.4.Thedetectorphaseforasinglescale.Fromlefttoright:inputimage,CNNgeneratedtextsaliencymapusing

54、thattext/no-textclassier,aftertherunlengthsmoothingphase,afterthewordsplittingphase,theimpliedboundingboxes.Subse-quently,theboundingboxeswillbecombinedatmultiplescalesandundergolteringandnon-maximalsuppression.mapisthesameresolutionastheoriginalimage.AstheCNNistrainedtodetecttextatasinglecanonicalh

55、eight,thisprocessisrepeatedfor16dierentscalestotargettextheightsbetween16and260pixelsbyresizingtheinputimage.Giventhesesaliencymaps,wordboundingboxesaregeneratedindependentlyateachscaleintwosteps.Therststepistoidentifylinesoftext.Tothisend,theprobabilitymapisrstthresholdedtondlocalregionsofhighproba

56、bility.Thentheseregionsareconnectedintextlinesbyusingtherunlengthsmoothingalgorithm(RLSA):foreachrowofpixelsthemeanµandstandarddeviationofthespacingsbetweenprobabilitypeaksarecomputedandneighboringregionsareconnectedifthespacebetweenthemislessthan3µ0.5.Findingconnectedcomponentsofthelinked

57、regionsresultsincandidatetextlines.Thenextstepistosplittextlinesintowords.Forthis,theimageiscroppedtojustthatofatextlineandOtsuthresholding37isappliedtoroughlysegmentforegroundcharactersfrombackground.Adjacentconnectedcomponents(whicharehopefullysegmentedcharacters)arethenconnectediftheirhorizontals

58、pacingsarelessthanthemeanhorizontalspacingforthetextline,againusingRLSA.Theresultingconnectedcomponentsgivecandidateboundingboxesforindividualwords,whicharethenaddedtotheglobalsetofboundingboxesatallscales.Finally,theseboundingboxesarelteredbasedongeometricconstraints(boxheight,aspectratio,etc.)andundergonon-maximalsuppressionsortingthembydecreasingaverageper-pixeltextsaliencyscore.TheaimofthewordrecognitionstageistotakethecandidatecroppedwordimagesIRW×HofwidthWandheightHandestimatethetextcontainedinthem.Inordertorecognizeawordfromaxedlexicon,eachwordhypoth-esisisscoredusin

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论