![李开复-对话式界面_第1页](http://file4.renrendoc.com/view/fda36e8a766234cbc0a1641750a27cb1/fda36e8a766234cbc0a1641750a27cb11.gif)
![李开复-对话式界面_第2页](http://file4.renrendoc.com/view/fda36e8a766234cbc0a1641750a27cb1/fda36e8a766234cbc0a1641750a27cb12.gif)
![李开复-对话式界面_第3页](http://file4.renrendoc.com/view/fda36e8a766234cbc0a1641750a27cb1/fda36e8a766234cbc0a1641750a27cb13.gif)
![李开复-对话式界面_第4页](http://file4.renrendoc.com/view/fda36e8a766234cbc0a1641750a27cb1/fda36e8a766234cbc0a1641750a27cb14.gif)
![李开复-对话式界面_第5页](http://file4.renrendoc.com/view/fda36e8a766234cbc0a1641750a27cb1/fda36e8a766234cbc0a1641750a27cb15.gif)
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ConversationalComputers:
Always10YearsAway?Kai-FuLeeCorporateVicePresidentMicrosoftCorporationWhyConversationalInterface?Speech:“invented”forinteraction“[Speech&languageare]abiologicaladaptationtocommunicateinformation…Oneofnature’sengineeringmarvels”–StevenPinker“Visionevolvedfromtheneedtosurvive;speechevolvedfromtheneedtocommunicate”–MichaelDertouzos.Benefitsof“ConversationalInterface”“Tome,speechrecognitionwillbeatransformingcapability…whenyoucanspeaktoyourcomputeranditwillunderstandwhatyou'resayingincontext.”–GordonMoore“Speechandnaturallanguageunderstandingarethekeytechnologiesthatwillhavethemostimpactinthenext15years.”–BillGatesFutureUIvisionassumeconversationalUIApple’s“KnowledgeNavigator”.Microsoft’s“informationatyourfingertips”.SciencefictionmoviesassumeconversationalUIBut“Always”10YearsAway1950JeromeWeisnerpredictedby1960machinetranslationmaybepossible1957HerbertSimonpredictedby1967machinewillmatchhumanperformanceinmanyareas1969USExpertPanelpredicted“voiceI/Owillbeincommonuseby1978”1993Ipredictedby2003everyPCwillshipwithspeechrecognition1998GartnerGrouppredictedPCUIwillassumevoiceinputby2003DecomposingthePredictionSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechTalkOutlineTalkOutlineNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsFundamentalEquationofSpeechRecognitionXistheacousticwaveformWisthewordstringAspeechrecognizerfindsWsuchthat
W=argmaxp(W|X)=argmaxp(X|W)p(W)p(X|W)istheacousticmodelp(W)isthelanguagemodelStatisticalModelingImprovingtheacousticmodel–p(X|W)StatisticalApproachBuildadetailedstatisticalmodelforeachword.Detailcouldbebasedonphonetics,speaker,dialect,gender,ordata-drivendetailsetc.Collectalotmoresamplesforeachword.Thereisnodatalikemoredata.Gotostepone.Improvingthelanguagemodel–p(W)StatisticalApproach–Trigrams.Thereisnodatalikemoredata.Thishelpsrecognition,notunderstanding.DoesMoore’sLawHelpSpeech?Moore’slawisnecessarybutnotsufficientJustfasterchipsmeansrecognitionerrorsappearfaster.Super-Moore’slawforspeech:Fasterprocessors/memory/disk+Gettingmorerealdata&feedbackloop+ImprovedstatisticalmodelsResult:Moore’slawdoublesperformancein18monthsSuper-Moore’slawhalveserrorsin60monthsSpeechRecognition:
ApproachingHumanErrorRateMicrosoftlicensedCMUSphinx-IIWhisperinMSRSpeechinOfficeXPSpeechinTablet/Office11SpeechinLonghornHumanErrorRateTalkOutlineNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsFundamentalApproachforTTSConcatenativeSynthesisConcatenationofpre-recordedspeechunitsFront-endNaturallanguageprocessing(wordbreaking,POS…)Determineemphasistodrivespeed,pitch,loudness.Back-endCollectalotofdataCarefullysegment&storeinadatabaseSelectthebestunitsfromthedatabaseFindstatisticalmetricsthatmatch“naturalness”,e.g.,smoothnessratherthanspecificdurationtargetsUsethesemetricstoselectunitsTexttoSpeech
ApproachingHumanNaturalnessNaturalnessHumanNaturalnessASR&TTS:Optimization&Engineering
ByleveragingMoore’slawExponentialimprovementsfrom…FasterCPU+biggerdatabase+betteralgorithmApproachinghumanabilities,butnotAI,but…Optimization,or“speechengineering”Stillfallsshortofhumanson:Learning,adaptation.Robustnesstoenvironment.ButmanyapplicationsjustfromASR&TTS:ASR:Dictation,speechsearch,speakerverification,languagelearning…TTS:Telephonyinfoaccess,voicefonts,voiceconversion…TalkOutlineNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsSyntax(rulesofthehuman’slanguage)Nouns,verbs,etc.andhowtheycombine“BookaboutatriptoChicago”vs.“BookatriptoChicago”Normalizelinguisticvariations.SemanticsMeaningofthewordsBookmeansreserveaticket;requiresfrom-city,to-city,etc.Context(additionalhints)Domainknowledge:NotrainfromHawaiitoChicagoStatistics:Bookasanoun>Bookasaverb“BookChicago”PersonalPreferences:Whereyoulive,yourcalendar,howyoupay…Modeloftime,urgency,
presenceDialog(resolvingambiguity&determineintent)“Buyabookorbooktravel?”“Whatdatewouldyouliketotravel?”NaturalLanguageUnderstandingCombines:ApplyingStatisticstoUnderstandingEngineeringapproach:Focusononedomain,engineeralltheknowledge.Collectdata&createfeedbacklooptoimprove.ApplyingBayesRuletounderstandingWisthewordstring
MisthemeaningAspeechrecognizerfindsMsuchthat
M=argmaxp(M|W)=argmaxp(W|M)p(W)p(W|M)modelsallthewaystoexpressa“meaning”p(M)isthesemanticmodelWhatis“unsolved”byStatistics?FusionofmanysourcesofknowledgeDomain-freeunderstandingInstantcontextswitchingGeneralknowledgeHistory,sports,etc.Commonsensereasoning“Leastcommonofallsenses”Ambiguity“Mr.WrightshouldwritetoMrs.Wright
rightaway”Emotion,humor,etc.Manyofthechallengesare“AI-complete”MilestonesinSpeechTechnologyResearch
1962 1967 1972 1977 1982 1987 1992 1997 2002IsolatedWordsFilter-bankanalysis;Time-normalization;DynamicprogrammingIsolatedWords;ConnectedDigits;ContinuousSpeechPatternrecognition;LPCanalysis;Clusteringalgorithms;ContinuousSpeech;SpeechUnderstandingStochasticlanguageunderstanding;Finite-statemachines;Statisticallearning;SmallVocabulary,AcousticPhonetics-basedMediumVocabular,Template-basedLargeVocabulary;Syntax,Semantics,ConnectedWords;ContinuousSpeechLargeVocabulary,Statistical-basedHiddenMarkovmodels;StochasticLanguagemodeling;Spokendialog;MultiplemodalitiesVeryLargeVocabulary;Semantics,MultimodalDialog,TTSConcatenativesynthesis;Machinelearning;Mixed-initiativedialog;FueledbyMoore’sLaw+Data+ResearchTalkOutlineSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeechWhyConstant10YearsAway?ImmaturetechnologyImprovingbutonlyrecentlybecomingusefulOver-soldexpectationsSciencefictionmoviesEffective(butnotrealproduct)demosUnder-estimatedrisksUserhabitsarehardtochangeCostofdevelopingspeechapplicationishighThingsaredifferentnow!TechnologyisreadyAndwehavelearnedourlessons.WhatHaveWeLearned?Don’tmakepredictions.…basedonextrapolatingfromonedatapoint!Thereisnodatalikemoredata.Realdata&feedback>Moore’sLaw.Changetheworld,onedomainatatime.Breakthroughfromdata+rigorisjustfine.Startwithuser’scomfortzone.Startwiththegreatestcustomerneed&businessopportunity.TalkOutlineSpeechrecognitionTexttospeechNaturallanguageunderstandingWhyhavewebeenaconstant10yearsaway?My3-year&10-yearpredictionsNaturalLanguageUnderstandingSpeechRecognitionTexttoSpeech3-YearSpeechPrediction:
MostRealisticNear-TermSpeechApplicationMeeting/VoicemailTranscriptionMarketOpportunityMobileDevices/CarsTelephony/CallCenterAccessibilityDesktopDictationWindowsCommands&Applications/APITechnologyReadinessCustomerNeedPoorAlternative10-YearSpeechPredictionsTelephonyDevicesDesktopDictation&NewapplicationsAllphoneshavespeech;Mainstreamapp2005Accessibility&AsianDictationMobility&Automotiv
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 个人信用贷款借款合同
- 工业机器人应用推广服务协议
- 工作总结与进度汇报模板
- 优惠协议书年
- 基于AI技术的智能家居设计开发协议
- 公司股东合作章程协议
- 应对企业运营挑战的综合性解决方案
- 离婚协议户口迁移协议书
- 金属矿产品采购与销售合同
- 药师资格证书租赁协议
- 【数控加工】数控铣床教案
- 科室药事管理记录本
- GB/T 3860-1995文献叙词标引规则
- 2023年Beck自杀意念评估量表
- GB/T 22560-2008钢铁件的气体氮碳共渗
- 统编版四年级道德与法治下册全册课件
- 医院评审工作临床科室资料盒目录(15个盒子)
- 压力性损伤指南解读
- 汤姆走丢了 详细版课件
- 大学学院学生心理危机预防与干预工作预案
- 国有土地上房屋征收与补偿条例 课件
评论
0/150
提交评论