下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
语音识别中英文资料对照外文翻译文献SpeechRecognitionVictorZueRonColeampWayneWardMITLaboratoryforComputerScienceCambridgeMassachusettsUSAOregonGraduateInstituteofScienceampTechnologyPortlandOregonUSACarnegieMellonUniversityPittsburghPennsylvaniaUSA1DefiningtheProblemSpeechrecognitionistheprocessofconvertinganacousticsignalcapturedbyamicrophoneoratelephonetoasetofwords.Therecognizedwordscanbethefinalresultsasforapplicationssuchascommandsampcontroldataentryanddocumentpreparation.Theycanalsoserveastheinputtofurtherlinguisticprocessinginordertoachievespeechunderstandingasubjectcoveredinsection.SpeechrecognitionsystemscanbecharacterizedbymanyparameterssomeofthemoreimportantofwhichareshowninFigure.Anisolated-wordspeechrecognitionsystemrequires1thatthespeakerpausebrieflybetweenwordswhereasacontinuousspeechrecognitionsystemdoesnot.Spontaneousorextemporaneouslygeneratedspeechcontainsdisfluenciesandismuchmoredifficulttorecognizethanspeechreadfromscript.Somesystemsrequirespeakerenrollment---ausermustprovidesamplesofhisorherspeechbeforeusingthemwhereasothersystemsaresaidtobespeaker-independentinthatnoenrollmentisnecessary.Someoftheotherparametersdependonthespecifictask.Recognitionisgenerallymoredifficultwhenvocabulariesarelargeorhavemanysimilar-soundingwords.Whenspeechisproducedinasequenceofwordslanguagemodelsorartificialgrammarsareusedtorestrictthecombinationofwords.Thesimplestlanguagemodelcanbespecifiedasafinite-statenetworkwherethepermissiblewordsfollowingeachwordaregivenexplicitly.Moregenerallanguagemodelsapproximatingnaturallanguagearespecifiedintermsofacontext-sensitivegrammar.Onepopularmeasureofthedifficultyofthetaskcombiningthevocabularysizeandthelanguagemodelisperplexitylooselydefinedasthegeometricmeanofthenumberofwordsthatcanfollowawordafterthelanguagemodelhasbeenappliedseesectionforadiscussionoflanguagemodelingingeneralandperplexityinparticular.Finallytherearesomeexternalparametersthatcanaffectspeechrecognitionsystemperformanceincludingthecharacteristicsoftheenvironmentalnoiseandthetypeandtheplacementofthemicrophone.ParametersRangeSpeakingModeIsolatedwordstocontinuousspeechSpeakingStyleReadspeechtospontaneousspeechEnrollmentSpeaker-dependenttoSpeaker-independentVocabularySmalllt20wordstolargegt20000wordsLanguageModelFinite-statetocontext-sensitivePerplexitySmalllt10tolargegt100SNRHighgt30dBtolawlt10dBTransducerVoice-cancellingmicrophonetotelephoneTable:TypicalparametersusedtocharacterizethecapabilityofspeechrecognitionsystemsSpeechrecognitionisadifficultproblemlargelybecauseofthemanysourcesofvariabilityassociatedwiththesignal.Firsttheacousticrealizationsofphonemesthesmallestsoundunitsofwhichwordsarecomposedarehighlydependentonthecontextinwhichtheyappear.Thesephoneticvariabilitiesareexemplifiedbytheacousticdifferencesofthephoneme,Atwordboundariescontextualvariationscanbequitedramatic---makinggasshortagesoundlikegashshortageinAmericanEnglishanddevoandaresoundlikedevandareinItalian.Secondacousticvariabilitiescanresultfromchangesintheenvironmentaswellasinthepositionandcharacteristicsofthetransducer.Thirdwithin-speakervariabilitiescanresultfromchangesinthespeakersphysicalandemotionalstatespeakingrateorvoicequality.Finallydifferencesinsociolinguisticbackgrounddialectandvocaltractsizeandshapecancontributetoacross-speakervariabilities.Figureshowsthemajorcomponentsofatypicalspeechrecognitionsystem.Thedigitizedspeechsignalisfirsttransformedintoasetofusefulmeasurementsorfeaturesatafixedratetypicallyonceevery10--20msecseesectionsand11.3forsignalrepresentationanddigitalsignalprocessingrespectively.Thesemeasurementsarethenusedtosearchforthemostlikelywordcandidatemakinguseofconstraintsimposedbytheacousticlexicalandlanguagemodels.Throughoutthisprocesstrainingdataareusedtodeterminethevaluesofthemodelparameters.Figure:Componentsofatypicalspeechrecognitionsystem.Speechrecognitionsystemsattempttomodelthesourcesofvariabilitydescribedaboveinseveralways.Atthelevelofsignalrepresentationresearchershavedevelopedrepresentationsthatemphasizeperceptuallyimportantspeaker-independentfeaturesofthesignalandde-emphasizespeaker-dependentcharacteristics.Attheacousticphoneticlevelspeakervariabilityistypicallymodeledusingstatisticaltechniquesappliedtolargeamountsofdata.Speakeradaptationalgorithmshavealsobeendevelopedthatadaptspeaker-independentacousticmodelstothoseofthecurrentspeakerduringsystemuseseesection.Effectsoflinguisticcontextattheacousticphoneticlevelaretypicallyhandledbytrainingseparatemodelsforphonemesindifferentcontextsthisiscalledcontextdependentacousticmodeling.Wordlevelvariabilitycanbehandledbyallowingalternatepronunciationsofwordsinrepresentationsknownaspronunciationnetworks.Commonalternatepronunciationsofwordsaswellaseffectsofdialectandaccentarehandledbyallowingsearchalgorithmstofindalternatepathsofphonemesthroughthesenetworks.Statisticallanguagemodelsbasedonestimatesofthefrequencyofoccurrenceofwordsequencesareoftenusedtoguidethesearchthroughthemostprobablesequenceofwords.ThedominantrecognitionparadigminthepastfifteenyearsisknownashiddenMarkovmodelsHMM.AnHMMisadoublystochasticmodelinwhichthegenerationoftheunderlyingphonemestringandtheframe-by-framesurfaceacousticrealizationsarebothrepresentedprobabilisticallyasMarkovprocessesasdiscussedinsectionsand11.2.NeuralnetworkshavealsobeenusedtoestimatetheframebasedscoresthesescoresarethenintegratedintoHMM-basedsystemarchitecturesinwhathascometobeknownashybridsystemsasdescribedinsection11.5.Aninterestingfeatureofframe-basedHMMsystemsisthatspeechsegmentsareidentifiedduringthesearchprocessratherthanexplicitly.Analternateapproachistofirstidentifyspeechsegmentsthenclassifythesegmentsandusethesegmentscorestorecognizewords.Thisapproachhasproducedcompetitiverecognitionperformanceinseveraltasks.2StateoftheArtCommentsaboutthestate-of-the-artneedtobemadeinthecontextofspecificapplicationswhichreflecttheconstraintsonthetask.Moreoverdifferenttechnologiesaresometimesappropriatefordifferenttasks.Forexamplewhenthevocabularyissmalltheentirewordcanbemodeledasasingleunit.Suchanapproachisnotpracticalforlargevocabularieswherewordmodelsmustbebuiltupfromsubwordunits.PerformanceofspeechrecognitionsystemsistypicallydescribedintermsofworderrorrateEdefinedas:whereNisthetotalnumberofwordsinthetestsetandSIandDarethetotalnumberofsubstitutionsinsertionsanddeletionsrespectively.Thepastdecadehaswitnessedsignificantprogressinspeechrecognitiontechnology.Worderrorratescontinuetodropbyafactorof2everytwoyears.Substantialprogresshasbeenmadeinthebasictechnologyleadingtotheloweringofbarrierstospeakerindependencecontinuousspeechandlargevocabularies.Thereareseveralfactorsthathavecontributedtothisrapidprogress.FirstthereisthecomingofageoftheHMM.HMMispowerfulinthatwiththeavailabilityoftrainingdatatheparametersofthemodelcanbetrainedautomaticallytogiveoptimalperformance.Secondmuchefforthasgoneintothedevelopmentoflargespeechcorporaforsystemdevelopmenttrainingandtesting.Someofthesecorporaaredesignedforacousticphoneticresearchwhileothersarehighlytaskspecific.Nowadaysitisnotuncommontohavetensofthousandsofsentencesavailableforsystemtrainingandtesting.Thesecorporapermitresearcherstoquantifytheacousticcuesimportantforphoneticcontrastsandtodetermineparametersoftherecognizersinastatisticallymeaningfulway.Whilemanyofthesecorporae.g.TIMITRMATISandWSJseesection12.3wereoriginallycollectedunderthesponsorshipoftheU.S.DefenseAdvancedResearchProjectsAgencyARPAtospurhumanlanguagetechnologydevelopmentamongitscontractorstheyhaveneverthelessgainedworld-wideacceptancee.g.inCanadaFranceGermanyJapanandtheU.K.asstandardsonwhichtoevaluatespeechrecognition.Thirdprogresshasbeenbroughtaboutbytheestablishmentofstandardsforperformanceevaluation.Onlyadecadeagoresearcherstrainedandtestedtheirsystemsusinglocallycollecteddataandhadnotbeenverycarefulindelineatingtrainingandtestingsets.Asaresultitwasverydifficulttocompareperformanceacrosssystemsandasystemsperformancetypicallydegradedwhenitwaspresentedwithpreviouslyunseendata.Therecentavailabilityofalargebodyofdatainthepublicdomaincoupledwiththespecificationofevaluationstandardshasresultedinuniformdocumentationoftestresultsthuscontributingtogreaterreliabilityinmonitoringprogresscorpusdevelopmentactivitiesandevaluationmethodologiesaresummarizedinchapters12and13respectively.Finallyadvancesincomputertechnologyhavealsoindirectlyinfluencedourprogress.Theavailabilityoffastcomputerswithinexpensivemassstoragecapabilitieshasenabledresearcherstorunmanylargescaleexperimentsinashortamountoftime.Thismeansthattheelapsedtimebetweenanideaanditsimplementationandevaluationisgreatlyreduced.Infactspeechrecognitionsystemswithreasonableperformancecannowruninrealtimeusinghigh-endworkstationswithoutadditionalhardware---afeatunimaginableonlyafewyearsago.OneofthemostpopularandpotentiallymostusefultaskswithlowperplexityPP11istherecognitionofdigits.ForAmericanEnglishspeaker-independentrecognitionofdigitstringsspokencontinuouslyandrestrictedtotelephonebandwidthcanachieveanerrorrateof0.3whenthestringlengthisknown.Oneofthebestknownmoderate-perplexitytasksisthe1000-wordso-calledResourceManagementRMtaskinwhichinquiriescanbemadeconcerningvariousnavalvesselsinthePacificocean.Thebestspeaker-independentperformanceontheRMtaskislessthan4usingaword-pairlanguagemodelthatconstrainsthepossiblewordsfollowingagivenwordPP60.Morerecentlyresearchershavebeguntoaddresstheissueofrecognizingspontaneouslygeneratedspeech.ForexampleintheAirTravelInformationServiceATISdomainworderrorratesoflessthan3hasbeenreportedforavocabularyofnearly2000wordsandabigramlanguagemodelwithaperplexityofaround15.Highperplexitytaskswithavocabularyofthousandsofwordsareintendedprimarilyforthedictationapplication.Afterworkingonisolated-wordspeaker-dependentsystemsformanyyearsthecommunityhassince1992movedtowardsvery-large-vocabulary20000wordsandmorehigh-perplexityPP≈200speaker-independentcontinuousspeechrecognition.Thebestsystemin1994achievedanerrorrateof7.2onreadsentencesdrawnfromNorthAmericabusinessnews.Withthesteadyimprovementsinspeechrecognitionperformancesystemsarenowbeingdeployedwithintelephoneandcellularnetworksinmanycountries.Withinthenextfewyearsspeechrecognitionwillbepervasiveintelephonenetworksaroundtheworld.Therearetremendousforcesdrivingthedevelopmentofthetechnologyinmanycountriestouchtonepenetrationislowandvoiceistheonlyoptionforcontrollingautomatedservices.Invoicedialingforexampleuserscandial10--20telephonenumbersbyvoicee.g.callhomeafterhavingenrolledtheirvoicesbysayingthewordsassociatedwithtelephonenumbers.ATampTontheotherhandhasinstalledacallroutingsystemusingspeaker-independentword-spottingtechnologythatcandetectafewkeyphrasese.g.persontopersoncallingcardinsentencessuchas:Iwanttochargeittomycallingcard.Atpresentseveralverylargevocabularydictationsystemsareavailablefordocumentgeneration.Thesesystemsgenerallyrequirespeakerstopausebetweenwords.Theirperformancecanbefurtherenhancedifonecanapplyconstraintsofthespecificdomainsuchasdictatingmedicalreports.Eventhoughmuchprogressisbeingmademachinesarealongwayfromrecognizingconversationalspeech.WordrecognitionratesontelephoneconversationsintheSwitchboardcorpusarearound50.Itwillbemanyyearsbeforeunlimitedvocabularyspeaker-independentcontinuousdictationcapabilityisrealized.3FutureDirectionsIn1992theU.S.NationalScienceFoundationsponsoredaworkshoptoidentifythekeyresearchchallengesintheareaofhumanlanguagetechnologyandtheinfrastructureneededtosupportthework.Thekeyresearchchallengesaresummarizedin.Researchinthefollowingareasforspeechrecognitionwereidentified:Robustness:Inarobustsystemperformancedegradesgracefullyratherthancatastrophicallyasconditionsbecomemoredifferentfromthoseunderwhichitwastrained.Differencesinchannelcharacteristicsandacousticenvironmentshouldreceiveparticularattention.Portability:Portabilityreferstothegoalofrapidlydesigningdevelopinganddeployingsystemsfornewapplications.Atpresentsystemstendtosuffersignificantdegradationwhenmovedtoanewtask.Inordertoreturntopeakperformance
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 中外设备买卖合同模板
- 上海金融服务外包合作合同模板汇集
- 临时教学楼改建工程合同
- 个人住房贷款合同样本
- 临时合作关系合同书
- 二手房购入合同范文:完整版
- 三人合伙投资合同范本
- 个人商业贷款抵押合同(1997年)版
- 个人债务履行担保合同示例
- 个人定向捐赠合同模板修订版
- 13J103-7《人造板材幕墙》
- 上海高考英语词汇手册列表
- PDCA提高患者自备口服药物正确坚持服用落实率
- 上海石油化工股份有限公司6181乙二醇装置爆炸事故调查报告
- 家谱人物简介(优选12篇)
- 品管部岗位职责20篇
- 2023年中智集团下属中智股份公司招聘笔试题库及答案解析
- GA 1409-2017警用服饰硬式肩章
- 小儿垂钓 (课件)(14张)
- 嘉吉乐恩贝1-FarLactation课件
- 激光拉曼光谱技术课件
评论
0/150
提交评论