版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ConversationalComputers:
Always10YearsAway?Kai-FuLeeCorporateVicePresidentMicrosoftCorporationWhyConversationalInterface?Speech:“invented”forinteraction“[Speech&languageare]abiologicaladaptationtocommunicateinformation…Oneofnature’sengineeringmarvels”–StevenPinker“Visionevolvedfromtheneedtosurvive;speechevolvedfromtheneedtocommunicate”–MichaelDertouzos.Benefitsof“ConversationalInterface”“Tome,speechrecognitionwillbeatransformingcapability…whenyoucanspeaktoyourcomputeranditwillunderstandwhatyou'resayingincontext.”–GordonMoore“Speechandnaturallanguageunderstandingarethekeytechnologiesthatwillhavethemostimpactinthenext15years.”–BillGatesFutureUIvisionassumeconversationalUIApple’s“KnowledgeNavigator”.Microsoft’s“informationatyourfingertips”.SciencefictionmoviesassumeconversationalUIBut“Always”10YearsAway1950JeromeWeisnerpredictedby1960machinetranslationmaybepossible1957HerbertSimonpredictedby1967machinewillmatchhumanperformanceinmanyareas1969USExpertPanelpredicted“voiceI/Owillbeincommonuseby1978”1993Ipredictedby2003everyPCwillshipwithspeechrecognition1998GartnerGrouppredictedPCUIwillassumevoiceinputby2003DecomposingthePredictionSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechTalkOutlineTalkOutlineNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsFundamentalEquationofSpeechRecognitionXistheacousticwaveformWisthewordstringAspeechrecognizerfindsWsuchthat
W=argmaxp(W|X)=argmaxp(X|W)p(W)p(X|W)istheacousticmodelp(W)isthelanguagemodelStatisticalModelingImprovingtheacousticmodel–p(X|W)StatisticalApproachBuildadetailedstatisticalmodelforeachword.Detailcouldbebasedonphonetics,speaker,dialect,gender,ordata-drivendetailsetc.Collectalotmoresamplesforeachword.Thereisnodatalikemoredata.Gotostepone.Improvingthelanguagemodel–p(W)StatisticalApproach–Trigrams.Thereisnodatalikemoredata.Thishelpsrecognition,notunderstanding.DoesMoore’sLawHelpSpeech?Moore’slawisnecessarybutnotsufficientJustfasterchipsmeansrecognitionerrorsappearfaster.Super-Moore’slawforspeech:Fasterprocessors/memory/disk+Gettingmorerealdata&feedbackloop+ImprovedstatisticalmodelsResult:Moore’slawdoublesperformancein18monthsSuper-Moore’slawhalveserrorsin60monthsSpeechRecognition:
ApproachingHumanErrorRateMicrosoftlicensedCMUSphinx-IIWhisperinMSRSpeechinOfficeXPSpeechinTablet/Office11SpeechinLonghornHumanErrorRateTalkOutlineNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsFundamentalApproachforTTSConcatenativeSynthesisConcatenationofpre-recordedspeechunitsFront-endNaturallanguageprocessing(wordbreaking,POS…)Determineemphasistodrivespeed,pitch,loudness.Back-endCollectalotofdataCarefullysegment&storeinadatabaseSelectthebestunitsfromthedatabaseFindstatisticalmetricsthatmatch“naturalness”,e.g.,smoothnessratherthanspecificdurationtargetsUsethesemetricstoselectunitsTexttoSpeech
ApproachingHumanNaturalnessNaturalnessHumanNaturalnessASR&TTS:Optimization&Engineering
ByleveragingMoore’slawExponentialimprovementsfrom…FasterCPU+biggerdatabase+betteralgorithmApproachinghumanabilities,butnotAI,but…Optimization,or“speechengineering”Stillfallsshortofhumanson:Learning,adaptation.Robustnesstoenvironment.ButmanyapplicationsjustfromASR&TTS:ASR:Dictation,speechsearch,speakerverification,languagelearning…TTS:Telephonyinfoaccess,voicefonts,voiceconversion…TalkOutlineNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsSyntax(rulesofthehuman’slanguage)Nouns,verbs,etc.andhowtheycombine“BookaboutatriptoChicago”vs.“BookatriptoChicago”Normalizelinguisticvariations.SemanticsMeaningofthewordsBookmeansreserveaticket;requiresfrom-city,to-city,etc.Context(additionalhints)Domainknowledge:NotrainfromHawaiitoChicagoStatistics:Bookasanoun>Bookasaverb“BookChicago”PersonalPreferences:Whereyoulive,yourcalendar,howyoupay…Modeloftime,urgency,
presenceDialog(resolvingambiguity&determineintent)“Buyabookorbooktravel?”“Whatdatewouldyouliketotravel?”NaturalLanguageUnderstandingCombines:ApplyingStatisticstoUnderstandingEngineeringapproach:Focusononedomain,engineeralltheknowledge.Collectdata&createfeedbacklooptoimprove.ApplyingBayesRuletounderstandingWisthewordstring
MisthemeaningAspeechrecognizerfindsMsuchthat
M=argmaxp(M|W)=argmaxp(W|M)p(W)p(W|M)modelsallthewaystoexpressa“meaning”p(M)isthesemanticmodelWhatis“unsolved”byStatistics?FusionofmanysourcesofknowledgeDomain-freeunderstandingInstantcontextswitchingGeneralknowledgeHistory,sports,etc.Commonsensereasoning“Leastcommonofallsenses”Ambiguity“Mr.WrightshouldwritetoMrs.Wright
rightaway”Emotion,humor,etc.Manyofthechallengesare“AI-complete”MilestonesinSpeechTechnologyResearch
1962 1967 1972 1977 1982 1987 1992 1997 2002IsolatedWordsFilter-bankanalysis;Time-normalization;DynamicprogrammingIsolatedWords;ConnectedDigits;ContinuousSpeechPatternrecognition;LPCanalysis;Clusteringalgorithms;ContinuousSpeech;SpeechUnderstandingStochasticlanguageunderstanding;Finite-statemachines;Statisticallearning;SmallVocabulary,AcousticPhonetics-basedMediumVocabular,Template-basedLargeVocabulary;Syntax,Semantics,ConnectedWords;ContinuousSpeechLargeVocabulary,Statistical-basedHiddenMarkovmodels;StochasticLanguagemodeling;Spokendialog;MultiplemodalitiesVeryLargeVocabulary;Semantics,MultimodalDialog,TTSConcatenativesynthesis;Machinelearning;Mixed-initiativedialog;FueledbyMoore’sLaw+Data+ResearchTalkOutlineSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechWhyConstant10YearsAway?ImmaturetechnologyImprovingbutonlyrecentlybecomingusefulOver-soldexpectationsSciencefictionmoviesEffective(butnotrealproduct)demosUnder-estimatedrisksUserhabitsarehardtochangeCostofdevelopingspeechapplicationishighThingsaredifferentnow!TechnologyisreadyAndwehavelearnedourlessons.WhatHaveWeLearned?Don’tmakepredictions.…basedonextrapolatingfromonedatapoint!Thereisnodatalikemoredata.Realdata&feedback>Moore’sLaw.Changetheworld,onedomainatatime.Breakthroughfromdata+rigorisjustfine.Startwithuser’scomfortzone.Startwiththegreatestcustomerneed&businessopportunity.TalkOutlineSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeech3-YearSpeechPrediction:
MostRealisticNear-TermSpeechApplicationMeeting/VoicemailTranscriptionMarketOpportunityMobileDevices/CarsTelephony/CallCenterAccessibilityDesktopDictationWindowsCommands&Applications/APITechnologyReadinessCustomerNeedPoorAlternative10-YearSpeechPredictionsTelephonyDevicesDesktopDictation&NewapplicationsAllphoneshavespeech;Mainstreamapp2005Accessibility&AsianDictationMobility&Automotiv
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 快速解读造价咨询招标
- 招标寻找专业可靠房地产销售代理公司
- 2024天津市小型建设工程施工合同范本
- 2024送气工聘用合同
- 建筑工具转让合同模板
- 物资丢失赔偿合同模板
- 外贸合同范例
- 护士证注册合同模板
- 燃气协议合同范例
- 玉器代销合同模板
- 毕业设计(论文):关于绿色物流的发展现状与应对措施
- 采购管理系统中运用业务重组的几点思考
- 有丝分裂课件.上课
- 第二部分项目管理人员配备情况及相关证明、业绩资料
- 旅游发展产业大会总体方案
- 民用机场竣工验收质量评定标准
- 汽车应急启动电源项目商业计划书写作范文
- 浅谈“低起点-小步子-勤练习-快反馈”教学策略
- 磁制冷技术的研究及应用
- 电缆桥架安装施工组织设计(完整版)
- 两癌筛查质控评估方案
评论
0/150
提交评论