1、Chapter 5 Gradient Estimation and Its Effects on AdaptationIn chapter 4 we assumed that an exact measurement of the gradient vector required by the adaptive process was available at each iteration. In most applications, however, an exact measurement is not available, and an estimate based on a limit

2、ed statistical sample must be used. GRADIENT COMPONENT ESTIMATION BY DERIVATVE MEASUREMENT222,2ddvdvdvxxll=Newtons method -the first and second derivative The method of steepest decent - the first derivative2minvxxl=+The derivative are estimated numerically by taking “central differences”。 222()()2(

3、)( )()dvvdvdvvvdvxxdxddxxdxxdd+-+-+-If 0These approximations become exact as approaches zero.For quadratic performance surfaces, we have()()()()22222vvvvvxdxddldldld+-+-=()( )()()()22222222vvvvvvxdxxddldlldld+-+-+-+-=00.511.522.533.543456789101112THE PERFORMANCE PENALTY()()( )12vvvgxdxdx轾=-+-臌(v-)(v

4、+)vFor the one-weight quadratic performance function,()()()22min22min122vvvgxldldxlld轾=+-+犏臌-+=2minvxxl=+We haveIn this result we see that is constant over a given performance function.2minminPgldxx=A dimensionless measure of the effect of the gradient estimate on the adaptive adjustment, called the

5、 “perturbation” P, can further be defined in term of as follows:DERIVATIVE MEASUREMENT AND PERFORMANCE PENALTIES WITH MULTIPLE WEIGHTSA two-dimensional gradient()min0min01122min00011 101 0 1,2TV RVvv v Rvr vr vr v vxxxx=+骣=+ 桫=+00011011rrRrr轾犏=犏臌Two-dimensional derivative measurement20000220()()()2d

6、vvvdvxxdxxdld+-+-=21111221()( )()2dvvvdvxxdxxdld+-+-=That is 111000,rr When the partial derivatives of this performance surface along coordinate are measured,the normalized performance penalty in term of P0, similar,10, vv 22001101minmin, rrPPddxx=The average perturbation during the entire measureme

7、nt is given by()2001101min122rrPPPdx+=+=Let us now define a general perturbation for L+1 weights as average of the perturbations of the individual gradient component measurements as follows( )220minmin 11Lnntrace RPLLlddxx=+01LnnavLll=+2min avPd lx=VARIANCE OF THE GRADIENT ESTIMATE( )( )22TT2kkEEW R

8、WP Wxxe=+-MSE is assume to based on N samples of We first define an unbiased estimate of rth moment (矩) of 2kexke11() NrrkrkNaea=( )( )rrkrEEaea=Example: let us derive the expected fourth moment, , under the assumption that the probability density of ,is normal with mean equal to zero and standard d

9、eviation equal to 4 e ( )22/22epesep-=have( )224/244432edpdeseeaeeesp- - =蝌similar,( )222/2222edpdMSEeseeaeeep- - =蝌The variance of the moment estimate 2varrrrE As the expected squared deviation from the mean, that is,( )( )()22222211var21 rrrrrrrNNrklrklEEENaaaa aaae ea=+-=-轾=-犏臌邋Hence,()222 rkrrkl

10、rrklrEklEEEkleae eeea轾=犏臌轾=犏臌轾轾=犏犏臌臌 The result is ( )()22222221varrrrrrrNNNNNaaaaaa轾=+-犏臌-=( )2422, so, varNaaaxx-=The values of depend on how k is distributed. For example, suppose that k is distributed normally with zero mean and with a variance . The mean fourth moment is 2443eas=( )44232varNNee

11、ssxx-=So in this case,22eas=The mean second moment is From this result for the normally distribtion of k ,we might anticipate that in generalthe variance of could be expressed as( )2varKNxx=We have shown that K is 2 when k is distribute normally with zero mean. When the distribution is normal but wi

12、th nonzero mean, K is also generally somewhat less than 2.Suppose that k is uniformly distributed with zero mean and with a standard deviation of , the expected moments for even values of r are ( )33212331rrrrrpddrssaeeeeess- -骣=桫=+蝌( )224var5Nax=Table 5.1Variance of the mean-squre-error estimateTab

13、le 5.1Variance of the mean-squre-error estimateOur estimate of the corresponding gradient component is ()()()12vvvxxdxdx=+-+where,V=W-W* 。We will continue to assume the error samples (values of ) are independentHave()()()()()()22221varvarvar41 2vvvvvNxxdxddxdxdd轾 犏轾轾=+-犏臌臌 犏犏臌=+-If converged to W*,

14、(v+) and (v - ) are approximately equal to min 。2min2varvnxxd骣轾 犏=犏 犏桫犏臌%Since the values of N and are the same for the estimates of all components of the gradient vector, and since the samples of k used in all estimates were assumed to be independent, the errors in all estimates are independent and

15、 have the same variance. The covariance matrix of the estimated gradient vector at the kth iteration is accordingly given by()()2min2cov: TkkkEINxd轾轾蜒-犏犏臌臌=EFFECTS ON THE WEIGHT-VECTOR SOLUTIONThe gradient estimation noisekkkN= +We examine the effect of the noisy gradient estimate on the weight vect

16、or solution, first with Newtons method and then with the steepest-descent method. For Newtons method 1111 kkkkkkWWRWRR Nmmm-+-=-=-*kkVWW=-()1112 12kkkkkkVVVR NVR Nmmmm-+-=-=-Note111 ,QQRVQV()()()( )11111212kkkkkVVQNVNmmmm-+-=-L=-Lhave|1-2| 1, k ,.“steady-state” solution()110 12nkknnVNmm- -= =-L-()()

17、()()1100011101212 12kkknknnVVNVVNmmmmm-= =-L =-L-MFor the method of steepest descent1kkkWWm+=-* ,2WWVNRVhave()()122kkkkkkVVRVNIR VNmmm+=-+=-AgainVQV2RV =()12kkkVIVNmm+=-L-VQVNQN1()()10101212kknkknnVRVNmmm- -= =-L()1012nkknnVNmm- -= = -Lk,”steady-state” solution:1RQ Q-=L()()2min2cov: NTkkkEIxd轾轾蜒-犏犏臌

18、臌=if ( )covTkkVkE V V轾轾=犏臌臌 ()()()()222121cov12covcovcov4 1kkkkVVNNmmmm-轾轾 =-+臌臌轾+L臌L轾=臌-()()()()21121 211111111112()12TTkkkkTTkkTTTkkkkVVV VNNVNNVmmmm-=-+LL轾-L+ L犏臌For Newtons methodFor the method of steepest-descent()()()()112111111221 21 2TTTkkkkTkkTTTkkkkVVIV VINNV NN Vmmmmmm-=-L-L+轾-L+-L犏臌()()2212cov2covcovcov4kkkkVIVNNmmmm-轾轾 =-L+臌臌轾+臌轾=L -L臌 Consider()()()( )1122


