语音编码关键技术及应用PPT课件.ppt

上传人：优*** IP属地：广东上传时间：2020-03-21 格式：PPT 页数：66 大小：4.19MB 积分：68 举报 版权申诉

已阅读5页，还剩61页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

中低速率语音编码关键技术及应用电子工程系崔慧娟10 4 15 各种压缩编码算法重建语音实例原始 8kHz采样 16bit 样点 128kb s 国际标准ITU TG 72816kb sITU TG 7298kb sITU TG 723 15 3kb sCVSD16kb s4 75kbpsAMR4 8kbpsEVRC 我们的声码器MPD ACELP4kb sSELP2 4kb sSELP1 2kb sSELP0 8kb sSELP0 6kb sSELP0 3kb s B 2 3 内容音频编码性能评价F目前水平F音频压缩依据F人耳听觉特性F现有标准F语音生成模型及参数编码FAdaptivePredictiveCoding APCFAnalysis by SynthesisCodingofSpeechFPerceptuallyWeightedFilterFTheDOD4 8kb sStandard CELPF 音频编码性能评价 1 编码速率 kb s 信号带宽300 3400HZ50 7000HZ20 15000Hz10 20000HZ采样速率8KHZ16KHZ32KHZ44 1 48KHZ 编码位数R b ps 总速率I kb s 可懂度自然度透明度影响重建质量存储容量传输带宽 4 音频编码性能评价 2 2 重建语音质量客观评价信噪比 15dB以上较好 20dB以上相当好分段信噪比PESQ Perceptualevaluationofspeechquality 5 6 PESQ Perceptualevaluationofspeechquality ITU TRecommendationP 862Anobjectivemethodforend to endspeechqualityassessmentofnarrow bandtelephonenetworksandspeechcodecsTheclosenessofthefitbetweenPESQandthesubjectivescoresmaybemeasuredbycalculatingthecorrelationcoefficient Normallythisisperformedonconditionaveragedscores aftermappingtheobjectivetothesubjectivescores 音频编码性能评价 3 PESQ ThecorrelationcoefficientiscalculatedwithPearson sformula Inthisformula xiistheconditionMOSforconditioni andistheaverageovertheconditionMOSvalues xi yiisthemappedcondition averagedPESQscoreforconditioni andistheaverageoverthepredictedconditionMOSvaluesyi For22knownITUbenchmarkexperiments theaveragecorrelationwas0 935 Foranagreedsetofeightexperimentsusedinthefinalvalidation experimentsthatwereunknownduringthedevelopmentofPESQ theaveragecorrelationwasalso0 935 音频编码性能评价 4 7 2 重建语音质量主观评价 1MOS分 MeanOpinionScore 5 1分 ExcellentGoodFairPoorBad 音频编码性能评价 5 8 2 重建语音质量主观评价 2DRT DiagnosticRhymeTest 正确错误总 100 例如为 wei 费 fei 95 以上优秀85 94 良好75 84 中等65 74 差65 以下不能接受判断可接受度测试DAT DiagnosticAcceptabilityTest 多维因素测试调制噪声参考单位MNRU ModulatedNoiseReferenceUnit 量化失真单位QDU QuantizationDistortionUnit 一次PCM编解码音频编码性能评价 6 9 3 编解码延时 ms 公众网 25ms 点对点广播存储回声控制或回声抵消正常通话秩序与重建质量关系4 算法复杂度硬件成本浮点定点MIPs RAM ROM5 其他抗随机误码和突发误码能力抗丢包和丢帧能力对不同信号编码能力级联或转接能力B 音频编码性能评价 7 10 11 目前水平目前发展水平宽带音频宽带语音高质量 2b ps下一步 1b ps电话语音高质量 1b ps下一步 0 5b ps B 音频压缩依据时域样点之间相关短时长时 F 频域谱的非平坦性谱包络谱离散统计特性语音信号的统计特性 F熵编码 1 冗余度 2 人耳听觉特性人耳对不同频段声音的敏感程度不同通常对低频比对高频更敏感人耳对语音信号的相位不敏感人耳掩蔽效应MaskingEffect 对人耳听不到或感知极不灵敏的声音分量都不妨视为冗余可利用听觉心理特性感觉加权量化去除多余分量后滤波 B 12 语音信号是非平稳随机过程时变性短时平稳性 10 20ms 分帧处理语音信号的特点 B 13 语音信号的统计特性短时平稳段分类无话信息最少清音信息较少浊音信息较多起始信息最多 B 14 人耳听觉特性 1 正常人的听域和听阈正常人可以听见的频率范围为0 016 16kHz 强度范围0 120dBSPL 声压级这里的基准声压 0dBSPL 是或自由场听阈是指人进入声场以后能听到的最低声压级纯音听阈是一个与频率有关的量 1000Hz时约为4dB左右而在40Hz时上升为50dB左右在15kHz时上升为24dB左右感觉阈代表可以容忍的最高声压在声压高到一定的程度时耳朵会出现不适的感觉或者具有痒压迫及痛感对正常人而言一般取120dB为不适阈 140dB为痛阈而且认为其与频率无关 15 16 人耳听觉特性 2 音调 Pitch 音调是在分辨声音频率高低时用于描述这种感受的一种特征对于频率低的声音听起来感觉它的音调低反之听起来感觉它的音调高但是音调与频率并不是成正比的关系它还与声音的强度和波形有关音调用美 Mel 标度高于听阈40dB 频率为1000Hz的纯音产生的音调定为1000Mel 音调与频率的近似公式人耳对不同频段声音的敏感程度不同通常对低频比对高频更敏感人耳对语音信号的相位不敏感 17 人耳听觉特性 3 掩蔽效应 Maskingeffect 当两个响度不等的声音作用于人耳时响度较高的频率成分的存在会影响到对响度较低的频率成分的感受使其变得不易察觉这种现象称为掩蔽效应由于频率低的声音在内耳耳蜗基底膜上传播的距离远于频率较高的声音故一般来说低音容易掩蔽高音而高音掩蔽低音较难掩蔽会造成因一个声音的存在而使另一个声音的听阈上升 18 人耳听觉特性 4 临界带宽 Criticalband 噪声的存在会影响到纯音的接收即对纯音产生掩蔽为了描述这种掩蔽效果引入了临界带宽的概念一个纯音可以被以它为中心频率且具有一定频带宽度的连续噪声所掩蔽这个纯音处于刚刚能被听到的临界状态在这一频带内的噪声功率等于该纯音的功率即称这一带宽为临界带宽可以通过实验测得临界带宽的单位可以用Bark来表示在20Hz 16kHz范围内的声音可以划分为24个Bark 粗略地讲一个临界带宽大约相当于耳蜗基底膜上1 5mmc长或对应大约1200根听神经纤维临界带宽编号Z Bark 与频率f Hz 之间的关系可以近似表示为 19 B 20 21 现有标准宽带音频ISO MPEGI 1991 SB Sub BandDBA DynamicBitAllocationISO MPEG2 1993 将采样率扩充到16 22 05 24KHZ 带宽分别为7 5 10 3 11 25KHz 22 PASC MPEG1Layer1 PrecisionAdaptiveSub bandCoding PASC MPEG1 Layer1 飞利浦公司用于DCC DigitalCompactCassette 的算法直接量化需要的数据率采样频率 32kHz 44 1kHz 48kHz 16bit Sampling编码速率 48 16 768kb s立体声1536kb sDCC标准最高记录频率为48kHz 可用8条轨迹记录信号允许的传输速率 384kb s 23 PASCCoding 将全频带信号划为32个子带将输入序列中连续512个样点数据滤波输出32个子带样点分块当每子带中的数据达到12个即作为一个数据处理单元进行处理总共32 12 384数据选择比例因子找出每一子带12个数据中的最大值作为本子带的比例因子用6比特量化编码输出计算各子带内信号的能量据以确定相应的量化比特数利用人的听觉感知特性更经济合理地分配好有限的编码比特以免做即使编码了人耳也听不见的无用功对子带中的每个样点进行PCM编码 24 PASC编码原理按以上PASC在384kb s的码率下具有很高的质量实测钢琴吉它的重建波形与原始波形之间看不出差别而频率较低的鼓声也只有轻微的差异 MPEGLayerIII 25 现有标准宽带语音 MLT ModulatedLappedTransformThealgorithmisbasedontransformtechnology usingamodulatedlappedtransform MLT Itoperateson20msframes 320samples ofaudio Becausethetransformwindow basisfunctionlength is640samplesanda50 320samples overlapisusedbetweenframes theeffectivelook aheadbuffersizeis20ms Hencethetotalalgorithmicdelayof40msisthesumoftheframesizepluslook ahead Allotherdelaysareduetocomputationalandnetworktransmissiondelays 26 27 现有标准嵌入式语音编码现有标准嵌入式语音编码G 729EV TDBWE TimeDomainBandwidthExtensionTDAC TimeDomainAliasingCancellation G 729EV G 729basedEmbeddedVariablebit ratecoder An8 32kbit sscalablewidebandcoderbitstreaminteroperablewithG 729Encoderat8kbit sandgeneratebitstreamwithG 729format 10msframesInputsamplingrateis16000Hzandthebitstreamisdividedinto20msframes orsuperframes 28 现有标准语音编码 B 29 ModelofSpeechProduction 1 2 30 31 LinearPredictiveCodingAlgorithm Equation 3 isthewellknownLPdifferenceequation whichstatesthatthevalueofthepresentoutput maybedeterminedby 3 summingtheweightedpresentinput andaweightedsumofthepastoutputsamples 32 InverseFilterProduceResidue 33 AnalysisandSynthesis OriginalResidue 34 35 Original LPCResidual PitchResidual 36 线性预测语音压缩编码语音带宽200 3400Hz采样8000Hz8bit 样点量化64kb sPCM由于语音信号的短时时不变特性可以进行分帧处理180样点帧一组特征参数帧 37 语音模型 LPC 的参数 1 声道参数PredictiveCoefficients莱文逊杜宾 Levinson Durbin 递推算法 2 基音周期PitchPeriod最大幅度差法 AMDF 自相关法 Autocorrelation 3 能量EnergyRMS 4 有声无声判决Voicingvoiced unvoiced 简单的二元激励比特分配BitAllocation 基音周期有声无声判决Pitch Voicing7能量RMS5预测系数PredictiveCoef 41同步Sync1 Total Frame 180Sample 22 5ms 54请计算这个编码器的速率 GovernmentStandardLinearPredictiveCodingAlgorithm LPC 10 ByThomasE Tremain BitRate 2 4kb s 采取什么技术可以进一步降低编码速率 38 参数矢量量化比特分配BitAllocation V UV判决3基音周期Pitch9能量RMS8预测系数PredictiveCoef 24同步Sync1 Total Frame 600Sample 75ms 45请计算这个编码器的速率 BitRate600b s声码器如何改善声码器合成语音质量编码过程中哪些措施会降低合成语音质量 39 SELP语音模型原始语音分析如下参数直接合成语音 600bit s 参数不量化预测冗余参数量化成600b s LSFRMSPitchV UV 40 41 AdaptivePredictiveCoding APC TheAPCscheme asshownintheFigure wasoriginallyproposedbyAtalandSchroeder anditemploysboththeshort termpredictor STP andthepitchorlong termpredictor LTP Theresultantexcitationsignalafterinversefilteringisscalarquantizedonasample by samplebasis APCschemeshavebeenproposedfor16kb sandbelow withvariationsonitstreatmentoftheresidualsignal INMARSAT sStandard Bsystememploysa16kb sAPCandisuseformaritimemobilesystems 42 TheLTPcanbeinterpretedas whereDisthe pitchperiod arethe pitchgain coefficientswhichreflecttheamountofcorrelationbetweenthedistantsamples andistheexcitation B 43 44 Analysis by SynthesisCodingofSpeech Thecodingratesbetween4 8 16kb s namelyanalysis and synthesis A and S schemesandanalysis by synthesis A by S schemes AlthoughA and SsuchasRELP APC ATCandSBChavebeenverysuccessfulatratesaround9 6 16kb s below9 6kb stheycannolongerproducegoodqualityspeech Therearetwomainreasonsfortheirshortcomings 45 Analysis by SynthesisCodingofSpeech thecodedspeechisnotanalyzedtoseeifthecodingprocedureisoperatingefficiently i e thereisnocheck controloverthedistortionsofthereconstructedspeech inadaptiveschemes theerrorsaccumulatedfrompreviousframesarenotconsideredinthecurrentframeofanalysis hencetheerrorspropagateintofollowingframeswithoutanyformofresetting 46 Analysis by SynthesisCodingofSpeech TheA by Smethodisnotuniquetospeechcoding butisageneraltechniqueusedinotherareasofestimationandidentification ThebasicideabehindA by Sisasfollows Firstitisassumedthatthesignalcanbeobservedandrepresentedinsomeform e g timeorfrequencydomain Thenatheoreticalformofthesignalproductionmodelisassumed Themodelhasanumberofparameterswhichcanbevariedtoproducedifferentrangesoftheobservablesignal 47 Analysis by SynthesisCodingofSpeech Toderivearepresentationofthemodelthatisofthesameformasthetruesignalmodel atrialanderrorprocedurecanbeapplied Byvaryingtheparametersofthemodelinasystematicway itispossibletofindasetofparametersthatcanproduceasyntheticsignalwhichmatchestherealsignalwithminimumerror assumingthemodelisvalidtobeginwith Therefore whensuchamatchiscalculated theparametersofthemodelareassumedtobetheparametersofthetruesignal 48 Analysis by SynthesisLPC A by SLPC InA by Sschemes particularlyA by SLPCschemes A and S stwomainshortcomingsareincorporated InA by SLPCcodingsystems a closed loop optimizationprocedureisusedtodeterminetheexcitationsignal whichwhenusedtoexcitethe modelfilter producesaperceptuallyoptimumsynthesizedspeechsignal Itisthis closed loop approachwhichenablestheseA by SLPCschemestobefarmoresuccessfulat4 8 9 6kb sthanconventionalA and SschemessuchasAPCandRELP Analysis by SynthesisLPC A by SLPC ThiscoderminimizestheerrorbetweentheoriginalandthesynthesizedsignalaccordingtoasuitableerrorcriterionbyvaryingtheexcitationsignalandtheSTPandLTPfilters Thisisachievedviaasequentialprocedure Firstthetime varyingfilterparametersaredetermined thenwiththesefixedtheexcitationisoptimized Decoder B 49 PerceptuallyWeightedFilter TheA by SLPCcoderminimizestheerrorbetweentheoriginalsignalandsynthesizedsignalaccordingtoasuitablecriterion However atlowbitratesthereisonlyoneorlessbitpersample thusitismoredifficulttomatchcloselythewaveform Whatisrequiredisanerrorcriterionwhichismoreinsympathywiththehumanperceptioncriterion Althoughmuchworkonauditoryperceptionisinprogress nosatisfactoryerrorcriterionhasyetemerged Inthemeantime howeverapopularbutnottotallysatisfactorymethodistheuseofaperceptuallyweightingfilterinA by Sschemes Thisweightingfilterisgivenby Theeffectofthefactordoesnotalterthecenterformantfrequency butjustbroadensthebandwidthoftheformantsbygivenbywhereisthesamplingfrequencyinhertz 50 PerceptuallyWeightedFilter Theamountofde emphasisiscontrolledby whichintroducesabroadeningeffectandmustliebetween0and1 Themostsuitablevalueofisselectedsubjectivelybylisteningtests andfor8kHzsamplingisusuallyaround0 8 0 9 Weightingfilterspectracomparedwiththeoriginalspeechenvelope 51 A by SLPCCodingSchemewithPerceptuallyWeightedFilter 52 SimpleStyle B 53 TheDOD4 8kb sStandard CELP ProposedFederalStandard1016 JosephP Campbell Jretc CELPAlgorithmDescriptionCELPcodingisaframe orientedtechniquethatbreaksasampledinputsignalintoblocksofsamples i e vectors thatareprocessedasoneunit CELPcodingisbasedonA by Ssearchprocedures perceptuallyweightedvectorquantization VQ andlinearprediction LP A10thorderLPfilterisusedtomodelthespeechsignal sshort termformantstructure Long termsignalperiodicityismodeledbyanadaptivecodebookVQ alsocalledpitchVQbecauseitoftenfollowsthespeaker spitchinvoicedspeech Theerrorfromtheshort termLPandpitchVQisvectorquantizedusingafixedstochasticcodebook Theoptimalscaledexcitationvectorsfromtheadaptiveandstochasticcodebooksareselectedbyminimizingatimevarying perceptuallyweighteddistortionmeasurethatimprovessubjectivespeechqualitybyexploitingmaskingpropertiesofhumanhearing 54 The8kHzsamplerateanda30msframesizewithfour7 5mssubframesareused CELPanalysisconsistsofthreebasicfunction CELPsynthesisconsistsofthecorrespondingthreesynthesisfunctionsperformedinreverseorderwiththeoptionaladditionofafourthfunctioncalledapostfilter toenhancetheoutputspeech ThetransmittedCELPparametersare short termlinearpredictionlong termadaptivecodebooksearchstochasticcodebooksearch thestochasticcodeindexandgain theadaptivecodebookindexandgain and10linespectralparameters LSP short termlinearprediction 55 Voiced Pitch Unvoiced WhiteNoise Synthesizer 56 Analyzer Techniques 57 58 CodeBookSearchMethods Thesearchproceduresforthestochasticandadaptivecodebooksarevirtuallyidentical differingonlyintheircodebooksandtargetvectors Toreducecomputation asequentialtwo stagesearchofthecodebooksisperformed Thetargetforthefirststageadaptivecodebooksearchistheweightedlinearpredictionresidualplusencodingerrorsintroducedinpreviousframesthataffectatthepresentframe ThesecondstagestochasticcodebooksearchtargetisthefirststagetargetminusthefilteredadaptivecodebookVQexcitation LettheL 60dimensionalrowvectors andrepresenttheoriginalspeechsignal thesyntheticspeechsignal andtheweightederrorsignal respectively Letvrepresenttheexcitationvectorbeingsearchedforinthepresentstageandletubetheexcitationvectorforthepreviousstage TheexcitationsequenceforanNsizecodebookwithinasub frameofsizeLischaracterizedbyacodebookindexi andacorrespondingoptimizedgainparameter Theexcitationvectorcanbewrittenas 3 1 codebookvector superscript codebookindexofthecodebookvector 59 LetHandWareL Lmatriceswhosej throwscontainthetruncatedimpulseresponsecausedbyaunitimpulseoftheLPfilteranderrorweightingfilter respectively ThesyntheticspeechcanbeexpressedastheLPfilter szeroinputresponse plustheconvolutionoftheLPfilter sexcitationandimpulseresponse 3 2 Theweightederrorsignalis 3 3 60 Target 3 4 Thus theweightederror isthetargetminusthescaledfilteredcodeword 3 5 filteredcodeword 3 6 LetrepresentthenormortotalsquarederrorforcodewordI 3 7 T transpose isafunctionofboththegainfactorandtheindexi theoptimalgaincanbecomputedbysettingthederivativeof withrespecttotheunknowngainvaluetozero 3 8 Therefore theoptimumgainistheratioofthecrosscorrelationofthetargetandfilteredcodewordtotheenergyofthefilteredcodeword 3 9 Thegaincannowbequantizedtojointlyoptimizethesearchforgainandindex 3 10 OurobjectiveistominimizethesquarederroratthereceivergivenbyEq 3 7 with substitutedfor Minimizing withrespectto isequivalenttomaximizingthenegativeofthelasttwotermsinEq 3 7 thefirstterm isindependentofthecodeword Thiscorrespondstomaximizingthematchscore 3 11 Ifthegainquantizationisignored Eq 3 9 canbesubstitutedinEq 3 11 andthematchscoreapproximationisthefamiliarnormalizedsquaredcrosscorrelation 3 12 Thus asshowninEq 11 thecodebooksearchprocedurefortheMSPEexcitationsequenceforA by Sweightedgain shapevectorquantizationistofindthecodeword andgain thatmaximizethematchscore Toreducecomputationalcomplexity thesearchesareperformedsequentially firstusingtheadaptivecodebookanditstarget followedbythestochasticcodebookanditstarget Theoptimalexcitationvectorareentirelycharacterizedbyeachoftheirindicesandcorrespondinggainfactors 61 target Foragivenvalueofi theoptimalgaincanbecomputedbysettingthederivativeofwithrespecttotheunknowngainvaluetozero 3 8 Therefore theoptimumgainistheratioofthecrosscorrelationofthetargetandfilteredcodewordtotheenergyofthefilteredcodeword 3 9 Thegaincannowbequantizedtojointlyoptimizethesearchforgainandindex 3 10 OurobjectiveistominimizethesquarederroratthereceivergivenbyEq 3 7 withsubstitutedfor Minimizingwithrespecttoiisequivalenttomaximizingthenegativeofthelasttwoterms

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

语音编码关键技术及应用PPT课件.ppt

文档简介

温馨提示

最新文档

评论

语音编码关键技术及应用PPT课件.ppt

文档简介

温馨提示

最新文档

评论

相关文档