中低速率语音编码关键技术及应用.docx_第1页
中低速率语音编码关键技术及应用.docx_第2页
中低速率语音编码关键技术及应用.docx_第3页
中低速率语音编码关键技术及应用.docx_第4页
中低速率语音编码关键技术及应用.docx_第5页
已阅读5页,还剩14页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、中、低速率语音编码关键技术及应用11线性预测2合成分析法、感觉加权滤波器3美国政府标准4.8 kb/s 码激励 线性预测编码算法4利用码激励及多脉冲线性预测编码技术的语音压缩编码国际标准LOW AND MEDIUM BIT-RATE SPEECH CODING1. Linear Predictive Coding Algorithm1.1Model of Speech Production1.2Government Standard Linear Predictive Coding Algorithm: LPC-101.3Adaptive Predictive Coding-APC1.4Res

2、idual Excited Linear Predictive Coding -RELP2. Analysis-by Synthesis Coding of Speech2.1Analysis-by-Synthesis LPC (A-by-S LPC)2.2Perceptually Weighted Filter3 The US DOD 4.8 kb/s Standard-CELP4 Generalized A-by-S LPC Coder with Different Excitation TypesInternational Standard with Code(Codebook)-Exc

3、ited Linear predictive Coding (CELP) and Multi-Pulse LPC (MPLPC)1. Linear Predictive Coding Algorithm1.1Model of speech productionLow bit rate speech coding relies on the Model of speech production.PP : Pitch PeriodVoicedImpuseSpeechSpeechTrainSamplesSwitchDigital Fiters nH zUnvoicedNoiseSpeechSeque

4、nceSpectral EnvelopeFigure 1. Block diagram of the simplified model of speech productionMG 1b z iiH zi 1(1.1)N1a j zjj1H zGGA zN(1.2)1jaj z j1Transforming equation (1.2) into the sampled time domain, we obtainps nGx na j s nj(1.3)j1Equation (1.3) is the wellknown LPC differenceequation, which states

5、 that the value of the present output,s n , may be determined by summing the weighted present input, Gx n , and a weighted sum of the past output samples.1.2Government Standard Linear Predictive Coding Algorithm: LPC-10 ( By Thomas E. tremain )The LPC-10 algorithm is based on linear predictive codin

6、g.Its parameters:(1)Predictive Coefficients aii 1, ,10 。(2) Pitch Period 。(3) Energy RMS 。(4) Voicing voiced/unvoiced.Bit Allocation:Pitch/Voicing7RMS5Sync1Predictive Coef.41Total/Frame (180 Sample)54Bit Rate2.4kb/s1.3 Adaptive Predictive Coding-APCThe APC scheme, as shown in Figure 2, was originall

7、y proposed by Atal and Schroeder, and it employs both the short-term predictor (STP) and the pitch or long-term predictor (LTP). The resultant excitation signal after inverse filtering is scalar quantised on a sample-by-sample basis. APC schemes have been propose for 16kb/s and below, with variation

8、s on its treatment of the residual signal. INMARSAT -B system employsStandard a 16 kb/s APC and is use for maritime mobile systems.r n?s nr nM+Quantiser+UTo-+XChannelEstimation &Quantisations nof Predictor+PITCHCoefficientsPridictor+LPCPredictorSide InformationEncoder?Dr ns n+FromEMChannelPITCHL

9、PCUPridictorPredictorXSide InformationDecoderFigure 2. Block Diagram of Adaptive Predictive CoderThe STP models the short-term correlation in the speech signal (spectral envelope), and has the form given by11(1.4a)A zP1iai zi 1ps nx naj s n j(1.4b)j 1where ai are the STP coefficients ,P is the filte

10、r order andx n is the excitation.The LTP can be interpreted as11(1.5a)P zIDi1bizi1Is nx nbi snD i(1.5b)i1where D is the“ pitch period” ,“ pitch gain” coefficients which reflect the amount of correlation betweenbiare thethe distant samples and x n is the excitation.Our aim is to model the long-term c

11、orrelation in the speech (fine spectral structure), left in the speech residual signal after LPC inverse filtering such that when the model parameters are used in a filter, it will removethe long-term correlation as much as possible, or spectrally flatten our signal. There are no obvious reasons why

12、 we must used the residual and not the original signal to model the long-term correlation in their speech signal, as long as the effects of the formants are taken into account during determination of the long-term delay (pitch) in our model. The pitch predictor or Long-Term Predictor(LTP) came befor

13、e the LPC or Short-Term Predictor(STP). The order of the LTP and STP is not too critical if the combination is carefully optimized.The bulk of the coding capacity in APC schemes is occupied by the coding of the residual signal, hence their drop in performance at lower rates.1.4 Residual Excited Line

14、ar Predictive Coding -RELPTo reduce the capacity required for the residual, baseband coders in the form of residual excited linear predictive coding (RELP) have been studied. As Figure 3. illustrates, the RELP coder is essentially an APC, but only a portion (low frequency part) of the residual signa

15、l is transmitted, typically with decimation factors of 3 to 4. The motivation behind RELP is that the residual information can be assumed to be concentrated in the low frequency regions, the baseband, and thus by only encoding this segment, reduction in coding capacity can be achieved. Hence at the

16、encoder, the baseband signal is extracted by low-pass filtering and quantised, at the decoder, the baseband is up-sampled together with high-frequency regeneration to give a full band signal. The RELP scheme mains attribute is its ability to operate even under extreme background noise conditions. It

17、s general subjective performance is limited to 9.6 kb/s and above.r m?r ms nLPCPitchBasebandQuantiserInverseInverseExtractionand+FilterFilterEncoderDecimationMUToXChannelLPCPitchPitch ParametersAnalysisAnalysisLPC ParametersEncoder?r mHighPitchLPC?Dequantises nFromDFrequencySynthesisSynthesisDecoder

18、ChannelERegenerationFilterFilterMPitch ParametersUXLPC ParametersDecoderFigure 3. Block Diagram of RELP CoderThe methods of speech coding described so far are base on an analysis-and-synthesis procedure, i.e. the speech signal is analyzed to extract the parameters which are used to remove speech red

19、undancies and the remaining signal is then quantised. At the decoder the reverse is performed, i.e. the quantised residual signal issynthesized with the extracted parameters added back in. This route of coding separates the parameter extraction and the quantisation procedure, and therefore control o

20、ver the distortions introduced is limited to the individualsubsystems. To have better control over the whole encoding process, i.e. to minimizethe total error in thesynthetic speech signal, analysis-by-synthesis (A-by-S) techniques have been proposed. In A-by-S methods, each subsystem is jointly opt

21、imized such that the overall synthetic speech introduce minimum distortion. This achieved by having a local decoder at the transmitter end such that the synthetic speech is available for analysis.is2Analysis-by-Synthesis Coding of SpeechThe coding rates between 4.8-16 kb/s, namely analysis-andsynthe

22、sis (A-and-S) schemes and analysis-bysynthesis (A-by-S) schemes. Although A-and-S such as RELP, APC, ATC and SBC have been very successful atrates around 9.6-16 kb/s, below 9.6 kb/s they can no longer produce good quality speech. There are two main reasons for their shortcomings:1) the coded speech

23、is not analyzed to see if the coding procedure is operating efficiently, i.e. there is no check/control over the distortions of the reconstructed speech 。1) in adaptive schemes, the errors accumulated from previous frames are not considered in the current frame of analysis, hence the errors propagat

24、e into following frames without any form of resetting.The A-by-S method is not unique to speech coding, but is a general technique used in other areas ofestimation and identification. The basic idea behind A-by-S is as follows. First it is assumed that the signal can be observed and represented in s

25、ome form, e.g. time or frequency domain. Then a theoretical form of the signal production model is assumed. The model has a number of parameters which can be varied to produce different ranges of the observable signal. To derive a representation of the model that is of the same form as the true sign

26、al model, a trial and error procedure can be applied. By varying the parameters of the model in a systematic way, it is possible to find a set of parameters that can produce a synthetic signal which matches the real signal with minimum error (assuming the model is valid to begin with). Therefore, wh

27、en such a match is calculated, the parameters of the model are assumed to be the parameters of the true signal.2.1 Analysis-by-Synthesis LPC (A-by-S LPC)In A-by-S schemes, particularly A-by-S LPC schemes, A-and-S s two main shortcomings are incorporated.In A-by- S LPC coding systems, a-loopclosed op

28、timization procedure is used to determine theexcitation signal,which when used to excite the model filter , produces a perceptually optimum synthesized speech signal. It is t closed-loop approach which enables these A-by-S LPC schemes to be far more successful at 4.8-9.6 kb/s than conventional A-and

29、-S schemes such as APC and RELP.The basic structure of an A-by-S LPC coding system is illustrated in Figure 4.Encoder11Original Speechs nP zA zExcitationPitchLPCs nSynthesisSynthesisGeneratorFilterFilterErrore nM inimisationDecoderExcitationPitchLPCSynthesisSynthesisGeneratorFilterFilterOutput Speec

30、hs nFigure 4. Block Diagram of A-by-S LPC Coding SchemeThis coder minimizes the error between the original s n?and the synthesized signal s n according to a suitableerror criterion by varying the excitation procedure. First the time-varying filter optimizedsignal and the STP and LTP filters. This is

31、 achieved via a sequential parameters are determined, then with these fixed the excitation is2.2Perceptually Weighted FilterThe A-by-S LPC coder minimizes the error between the original signal and synthesized signal according to a suitable criterion. However, at low bit rates there is only one or le

32、ss bit per sample, thus it is more difficult to match closely the waveform. What is required is an error criterion which is more in sympathy with the human perception criterion. Although much work on auditory perception is in progress, no satisfactory error criterion has yet emerged. In the meantime

33、, however a popular but not totally satisfactory method is the use of a perceptually weighting filter in A-by-S schemes. This weighting filter is given byPi1 z i1aiW zi1Pi2 z i(2.1)1aii1The effect of the factordoes not alter the center formant frequency, but just broadens the bandwidth of theformant

34、s by f given byff s lnHz(2.2)where f s is the sampling frequency in hertz.Figure 5. Typical plots of weighting filter spectra compared with the original speech envelopeAs can be seen from figure 5, the weighting filter de-emphasizes the frequency regions corresponding to theformants as determined by

35、 the LPC analysis. By allocating larger distortion in the formants regions, noise that ismore subjectively disturbing in the formant nulls can be reduced. The amount of de-emphasis is controlled by,which introduces a broadening effect and must lie between 0 and 1. The most suitable value ofis select

36、edsubjectively by listening tests, and for 8 kHz sampling,is usually around 0.8-0.9.A modified part of A-by-S LPC coding scheme with perceptually weighted filter is illustrated in Figure 6.Original Speechs nW zs WnExcitation11GeneratorP zA W zErrorMinimisations ne nFigure 6. A-by-S LPC Coding Scheme

37、 withPerceptually Weighted FilterIn multi-pulse LPC (MPLPC and regular-pulse excited LPC (RPELPC), the LTP can be omitted as it can generate the pitch pulses by excitation alone.3. The DOD 4.8 kb/s Standard-CELP(Proposed Federal Standard 1016)Joseph P. Campbell, Jr etc.CELP Algorithm DescriptionCELP

38、 coding is a frame-oriented technique that breaks a sampled input signal into blocks of samples (i.e., vectors)that are processed as one unit. CELP coding is based on A-by-S search procedures, perceptually weighted vectorthterm formant structure. Long-term signal periodicity is modeled by an adaptiv

39、e code book VQ (also called pitchVQ because it often follows the speaker s pitch in voiced speech). The error from-termtheLPshortand pitch VQis vector quantized using a fixed stochastic code book. The optimal scaled excitation vectors from the adaptiveand stochastic code books are selected by minimi

40、zing a time varying, perceptually weighted distortion measurethat improves subjective speech quality by exploiting masking properties of human hearing.The 8 kHz sample rate and a 30 ms frame size with four 7.5 ms subframes are used. CELP analysis consists ofthree basic function:1) short-term linear

41、prediction2) long-term adaptive code book search3) stochastic code book search.- s sCELP synthesis consists of the corresponding three synthesis functions performed in reverse order with the optional addition of a fourth function called a postfilter, to enhance the output speech. The transmitted CEL

42、P parameters are the stochastic code index and gain, the adaptive code book index and gain, and 10 line spectral parameters(LSP).Stochasticgscodebookis512601uvEnhancedAdaptiveLinearAdaptiveSyntheticgacodebookPredictorPostfilterOutputi aSpeech256601Delay by 1SubframessCELP SythesizerInputSpeechErrore

43、Perceptual WeightingMinimizationFilterFigure 7. CELPCode Book Search MethodsThe search procedures for the stochastic and adaptive code books are virtually identical, differing only in theircode books and target vectors. To reduce computation, a sequential two-stage search of the code books is perfor

44、med. The target for the first stage adaptive code book search is the weighted linear prediction residual plus encoding errors introduced in previous frames that affect a the present frame. The second stage stochastic code book search target is the first stage target minus the filtered adaptive code

45、book VQ excitation.se represent the original speech signal, the synthetic speechLet the L=60 dimensional row vectors s, ? andsignal, and the weighted error signal , respectively. Let v represent the excitation vector being searched for in the present stage and let u be the excitation vector for the

46、previous stage. The excitation sequence for an N size codebookwithin a subframeofsize L is characterized by a code book index i , 1i N , and a correspondingoptimized gain parametergi. The excitation vectorv ican be written as:v igi x i,(3.1)x i: code book vector.superscript: code book index of the c

47、ode book vectorx iLet H and W are LL matrices whose j-th rows contain the truncated impulse response caused by a unit impulse tj of the LP filter and error weighting filter, respectively. As shown in Fig. 6, the synthetic speech can beexpressed as the LPfilterzerosinput? 0, plustheconvolution ofthe

48、LP filter excitationsandresponse, Simpulse response:? i? 0u viH,1iN,(3.2)ss0 vector in the first stagesearchuscaledadaptiveexcitation vectorThe weighted error signal is:eis? iWs(3.3)e 0v i,HWtarget e 0 :essW uHW(3.4)0?0Thus, the weighted error, e i , is the target minus the scaled filtered codeword:

49、e ie 0g i y i(3.5)y i: filtered codeword:y ix iHW(3.6).Let Ei represent the norm or total squared error for codewordiE ie ie ie i T(3.7)e 0 e 0 T2g i e 0 y i Tg i 2 y i y i TT : transposeE i is a function of both the gain factorg iand the index i .For a given value of i , the optimal gain can be computed by setting the derivative ofE i with respect to theunknown gain value to zero:E i2e(0) y (i)T2g i y (i) y (i)T0 ,(3.8)g iTherefore, the optimum gain is the ra

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论