



下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
语音识别中英文资料对照外文翻译文献SpeechRecognitionVictorZueRonColeampWayneWardMITLaboratoryforComputerScienceCambridgeMassachusettsUSAOregonGraduateInstituteofScienceampTechnologyPortlandOregonUSACarnegieMellonUniversityPittsburghPennsylvaniaUSA1DefiningtheProblemSpeechrecognitionistheprocessofconvertinganacousticsignalcapturedbyamicrophoneoratelephonetoasetofwords.Therecognizedwordscanbethefinalresultsasforapplicationssuchascommandsampcontroldataentryanddocumentpreparation.Theycanalsoserveastheinputtofurtherlinguisticprocessinginordertoachievespeechunderstandingasubjectcoveredinsection.SpeechrecognitionsystemscanbecharacterizedbymanyparameterssomeofthemoreimportantofwhichareshowninFigure.Anisolated-wordspeechrecognitionsystemrequires1thatthespeakerpausebrieflybetweenwordswhereasacontinuousspeechrecognitionsystemdoesnot.Spontaneousorextemporaneouslygeneratedspeechcontainsdisfluenciesandismuchmoredifficulttorecognizethanspeechreadfromscript.Somesystemsrequirespeakerenrollment---ausermustprovidesamplesofhisorherspeechbeforeusingthemwhereasothersystemsaresaidtobespeaker-independentinthatnoenrollmentisnecessary.Someoftheotherparametersdependonthespecifictask.Recognitionisgenerallymoredifficultwhenvocabulariesarelargeorhavemanysimilar-soundingwords.Whenspeechisproducedinasequenceofwordslanguagemodelsorartificialgrammarsareusedtorestrictthecombinationofwords.Thesimplestlanguagemodelcanbespecifiedasafinite-statenetworkwherethepermissiblewordsfollowingeachwordaregivenexplicitly.Moregenerallanguagemodelsapproximatingnaturallanguagearespecifiedintermsofacontext-sensitivegrammar.Onepopularmeasureofthedifficultyofthetaskcombiningthevocabularysizeandthelanguagemodelisperplexitylooselydefinedasthegeometricmeanofthenumberofwordsthatcanfollowawordafterthelanguagemodelhasbeenappliedseesectionforadiscussionoflanguagemodelingingeneralandperplexityinparticular.Finallytherearesomeexternalparametersthatcanaffectspeechrecognitionsystemperformanceincludingthecharacteristicsoftheenvironmentalnoiseandthetypeandtheplacementofthemicrophone.ParametersRangeSpeakingModeIsolatedwordstocontinuousspeechSpeakingStyleReadspeechtospontaneousspeechEnrollmentSpeaker-dependenttoSpeaker-independentVocabularySmalllt20wordstolargegt20000wordsLanguageModelFinite-statetocontext-sensitivePerplexitySmalllt10tolargegt100SNRHighgt30dBtolawlt10dBTransducerVoice-cancellingmicrophonetotelephoneTable:TypicalparametersusedtocharacterizethecapabilityofspeechrecognitionsystemsSpeechrecognitionisadifficultproblemlargelybecauseofthemanysourcesofvariabilityassociatedwiththesignal.Firsttheacousticrealizationsofphonemesthesmallestsoundunitsofwhichwordsarecomposedarehighlydependentonthecontextinwhichtheyappear.Thesephoneticvariabilitiesareexemplifiedbytheacousticdifferencesofthephoneme,Atwordboundariescontextualvariationscanbequitedramatic---makinggasshortagesoundlikegashshortageinAmericanEnglishanddevoandaresoundlikedevandareinItalian.Secondacousticvariabilitiescanresultfromchangesintheenvironmentaswellasinthepositionandcharacteristicsofthetransducer.Thirdwithin-speakervariabilitiescanresultfromchangesinthespeakersphysicalandemotionalstatespeakingrateorvoicequality.Finallydifferencesinsociolinguisticbackgrounddialectandvocaltractsizeandshapecancontributetoacross-speakervariabilities.Figureshowsthemajorcomponentsofatypicalspeechrecognitionsystem.Thedigitizedspeechsignalisfirsttransformedintoasetofusefulmeasurementsorfeaturesatafixedratetypicallyonceevery10--20msecseesectionsand11.3forsignalrepresentationanddigitalsignalprocessingrespectively.Thesemeasurementsarethenusedtosearchforthemostlikelywordcandidatemakinguseofconstraintsimposedbytheacousticlexicalandlanguagemodels.Throughoutthisprocesstrainingdataareusedtodeterminethevaluesofthemodelparameters.Figure:Componentsofatypicalspeechrecognitionsystem.Speechrecognitionsystemsattempttomodelthesourcesofvariabilitydescribedaboveinseveralways.Atthelevelofsignalrepresentationresearchershavedevelopedrepresentationsthatemphasizeperceptuallyimportantspeaker-independentfeaturesofthesignalandde-emphasizespeaker-dependentcharacteristics.Attheacousticphoneticlevelspeakervariabilityistypicallymodeledusingstatisticaltechniquesappliedtolargeamountsofdata.Speakeradaptationalgorithmshavealsobeendevelopedthatadaptspeaker-independentacousticmodelstothoseofthecurrentspeakerduringsystemuseseesection.Effectsoflinguisticcontextattheacousticphoneticlevelaretypicallyhandledbytrainingseparatemodelsforphonemesindifferentcontextsthisiscalledcontextdependentacousticmodeling.Wordlevelvariabilitycanbehandledbyallowingalternatepronunciationsofwordsinrepresentationsknownaspronunciationnetworks.Commonalternatepronunciationsofwordsaswellaseffectsofdialectandaccentarehandledbyallowingsearchalgorithmstofindalternatepathsofphonemesthroughthesenetworks.Statisticallanguagemodelsbasedonestimatesofthefrequencyofoccurrenceofwordsequencesareoftenusedtoguidethesearchthroughthemostprobablesequenceofwords.ThedominantrecognitionparadigminthepastfifteenyearsisknownashiddenMarkovmodelsHMM.AnHMMisadoublystochasticmodelinwhichthegenerationoftheunderlyingphonemestringandtheframe-by-framesurfaceacousticrealizationsarebothrepresentedprobabilisticallyasMarkovprocessesasdiscussedinsectionsand11.2.NeuralnetworkshavealsobeenusedtoestimatetheframebasedscoresthesescoresarethenintegratedintoHMM-basedsystemarchitecturesinwhathascometobeknownashybridsystemsasdescribedinsection11.5.Aninterestingfeatureofframe-basedHMMsystemsisthatspeechsegmentsareidentifiedduringthesearchprocessratherthanexplicitly.Analternateapproachistofirstidentifyspeechsegmentsthenclassifythesegmentsandusethesegmentscorestorecognizewords.Thisapproachhasproducedcompetitiverecognitionperformanceinseveraltasks.2StateoftheArtCommentsaboutthestate-of-the-artneedtobemadeinthecontextofspecificapplicationswhichreflecttheconstraintsonthetask.Moreoverdifferenttechnologiesaresometimesappropriatefordifferenttasks.Forexamplewhenthevocabularyissmalltheentirewordcanbemodeledasasingleunit.Suchanapproachisnotpracticalforlargevocabularieswherewordmodelsmustbebuiltupfromsubwordunits.PerformanceofspeechrecognitionsystemsistypicallydescribedintermsofworderrorrateEdefinedas:whereNisthetotalnumberofwordsinthetestsetandSIandDarethetotalnumberofsubstitutionsinsertionsanddeletionsrespectively.Thepastdecadehaswitnessedsignificantprogressinspeechrecognitiontechnology.Worderrorratescontinuetodropbyafactorof2everytwoyears.Substantialprogresshasbeenmadeinthebasictechnologyleadingtotheloweringofbarrierstospeakerindependencecontinuousspeechandlargevocabularies.Thereareseveralfactorsthathavecontributedtothisrapidprogress.FirstthereisthecomingofageoftheHMM.HMMispowerfulinthatwiththeavailabilityoftrainingdatatheparametersofthemodelcanbetrainedautomaticallytogiveoptimalperformance.Secondmuchefforthasgoneintothedevelopmentoflargespeechcorporaforsystemdevelopmenttrainingandtesting.Someofthesecorporaaredesignedforacousticphoneticresearchwhileothersarehighlytaskspecific.Nowadaysitisnotuncommontohavetensofthousandsofsentencesavailableforsystemtrainingandtesting.Thesecorporapermitresearcherstoquantifytheacousticcuesimportantforphoneticcontrastsandtodetermineparametersoftherecognizersinastatisticallymeaningfulway.Whilemanyofthesecorporae.g.TIMITRMATISandWSJseesection12.3wereoriginallycollectedunderthesponsorshipoftheU.S.DefenseAdvancedResearchProjectsAgencyARPAtospurhumanlanguagetechnologydevelopmentamongitscontractorstheyhaveneverthelessgainedworld-wideacceptancee.g.inCanadaFranceGermanyJapanandtheU.K.asstandardsonwhichtoevaluatespeechrecognition.Thirdprogresshasbeenbroughtaboutbytheestablishmentofstandardsforperformanceevaluation.Onlyadecadeagoresearcherstrainedandtestedtheirsystemsusinglocallycollecteddataandhadnotbeenverycarefulindelineatingtrainingandtestingsets.Asaresultitwasverydifficulttocompareperformanceacrosssystemsandasystemsperformancetypicallydegradedwhenitwaspresentedwithpreviouslyunseendata.Therecentavailabilityofalargebodyofdatainthepublicdomaincoupledwiththespecificationofevaluationstandardshasresultedinuniformdocumentationoftestresultsthuscontributingtogreaterreliabilityinmonitoringprogresscorpusdevelopmentactivitiesandevaluationmethodologiesaresummarizedinchapters12and13respectively.Finallyadvancesincomputertechnologyhavealsoindirectlyinfluencedourprogress.Theavailabilityoffastcomputerswithinexpensivemassstoragecapabilitieshasenabledresearcherstorunmanylargescaleexperimentsinashortamountoftime.Thismeansthattheelapsedtimebetweenanideaanditsimplementationandevaluationisgreatlyreduced.Infactspeechrecognitionsystemswithreasonableperformancecannowruninrealtimeusinghigh-endworkstationswithoutadditionalhardware---afeatunimaginableonlyafewyearsago.OneofthemostpopularandpotentiallymostusefultaskswithlowperplexityPP11istherecognitionofdigits.ForAmericanEnglishspeaker-independentrecognitionofdigitstringsspokencontinuouslyandrestrictedtotelephonebandwidthcanachieveanerrorrateof0.3whenthestringlengthisknown.Oneofthebestknownmoderate-perplexitytasksisthe1000-wordso-calledResourceManagementRMtaskinwhichinquiriescanbemadeconcerningvariousnavalvesselsinthePacificocean.Thebestspeaker-independentperformanceontheRMtaskislessthan4usingaword-pairlanguagemodelthatconstrainsthepossiblewordsfollowingagivenwordPP60.Morerecentlyresearchershavebeguntoaddresstheissueofrecognizingspontaneouslygeneratedspeech.ForexampleintheAirTravelInformationServiceATISdomainworderrorratesoflessthan3hasbeenreportedforavocabularyofnearly2000wordsandabigramlanguagemodelwithaperplexityofaround15.Highperplexitytaskswithavocabularyofthousandsofwordsareintendedprimarilyforthedictationapplication.Afterworkingonisolated-wordspeaker-dependentsystemsformanyyearsthecommunityhassince1992movedtowardsvery-large-vocabulary20000wordsandmorehigh-perplexityPP≈200speaker-independentcontinuousspeechrecognition.Thebestsystemin1994achievedanerrorrateof7.2onreadsentencesdrawnfromNorthAmericabusinessnews.Withthesteadyimprovementsinspeechrecognitionperformancesystemsarenowbeingdeployedwithintelephoneandcellularnetworksinmanycountries.Withinthenextfewyearsspeechrecognitionwillbepervasiveintelephonenetworksaroundtheworld.Therearetremendousforcesdrivingthedevelopmentofthetechnologyinmanycountriestouchtonepenetrationislowandvoiceistheonlyoptionforcontrollingautomatedservices.Invoicedialingforexampleuserscandial10--20telephonenumbersbyvoicee.g.callhomeafterhavingenrolledtheirvoicesbysayingthewordsassociatedwithtelephonenumbers.ATampTontheotherhandhasinstalledacallroutingsystemusingspeaker-independentword-spottingtechnologythatcandetectafewkeyphrasese.g.persontopersoncallingcardinsentencessuchas:Iwanttochargeittomycallingcard.Atpresentseveralverylargevocabularydictationsystemsareavailablefordocumentgeneration.Thesesystemsgenerallyrequirespeakerstopausebetweenwords.Theirperformancecanbefurtherenhancedifonecanapplyconstraintsofthespecificdomainsuchasdictatingmedicalreports.Eventhoughmuchprogressisbeingmademachinesarealongwayfromrecognizingconversationalspeech.WordrecognitionratesontelephoneconversationsintheSwitchboardcorpusarearound50.Itwillbemanyyearsbeforeunlimitedvocabularyspeaker-independentcontinuousdictationcapabilityisrealized.3FutureDirectionsIn1992theU.S.NationalScienceFoundationsponsoredaworkshoptoidentifythekeyresearchchallengesintheareaofhumanlanguagetechnologyandtheinfrastructureneededtosupportthework.Thekeyresearchchallengesaresummarizedin.Researchinthefollowingareasforspeechrecognitionwereidentified:Robustness:Inarobustsystemperformancedegradesgracefullyratherthancatastrophicallyasconditionsbecomemoredifferentfromthoseunderwhichitwastrained.Differencesinchannelcharacteristicsandacousticenvironmentshouldreceiveparticularattention.Portability:Portabilityreferstothegoalofrapidlydesigningdevelopinganddeployingsystemsfornewapplications.Atpresentsystemstendtosuffersignificantdegradationwhenmovedtoanewtask.Inordertoreturntopeakperformance
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年广东机电职业技术学院单招职业技能测试题库含答案
- 2025年云南省丽江地区单招职业适应性测试题库附答案
- 2025年常州机电职业技术学院单招职业技能考试题库含答案
- 2025年山东省德州市单招职业适应性考试题库含答案
- 2025届广州市从化区从化七中度中考生物押题卷含解析
- 2025年上半年湖北咸宁市市直事业单位统一招聘工作人员195人易考易错模拟试题(共500题)试卷后附参考答案
- 2025年上半年湖北三江航天江河化工科技限公司招聘易考易错模拟试题(共500题)试卷后附参考答案
- 2025年上半年深圳市宝安区土地整备中心选调招考工作人员易考易错模拟试题(共500题)试卷后附参考答案
- 2025年上半年海南美兰国际机场货运限责任公司招聘若干名航空货运工作人员易考易错模拟试题(共500题)试卷后附参考答案
- 2025年上半年海南省儋州市突发事件预警信息中心招聘易考易错模拟试题(共500题)试卷后附参考答案
- 2019北师大版五年级数学下册教材分析讲义课件
- 更换备胎课件
- 2、3的加法课件-学前班用
- 起重机械安全风险管控清单模板
- 远离违法犯罪课件
- 特种设备安全监察的发展历史、现状及未来展望课件
- 教育政策与法规全套完整教学课件
- 高职院校医学检验技术专业《生物化学》课程标准
- 南京晓庄学院信笺纸
- GB/T 7113.5-2011绝缘软管第5部分:硅橡胶玻璃纤维软管
- GB/T 210-2022工业碳酸钠
评论
0/150
提交评论