GradientDescent梯度下降法课件_第1页
GradientDescent梯度下降法课件_第2页
GradientDescent梯度下降法课件_第3页
GradientDescent梯度下降法课件_第4页
GradientDescent梯度下降法课件_第5页
已阅读5页,还剩10页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、GradientDirectional DerivativesGradient descent (GD): AKA steepest descent (SD)Goal: Minimize a function iteratively based on gradientFormula for GD:Normalized versionWith momentumGradient Descent (GD)Step size or learning rateQuiz!Vanilla GDorExample of Single-Input FunctionsIf n=1, GD reduces to t

2、he problem of going left or right.ExampleAnimation:/?p=gradient.descentEach point/region with zero gradient has a basin of attractionBasin of Attraction in 1DExample of Two-input Functions“Peaks” Functions (1/2)If n=2, GD needs to find a direction in 2D plane.Example: “Peaks” function in MATLABAnima

3、tion: gradientDescentDemo.mGradients is perpendicularto contours, why?3 local maxima3 local minima“Peaks” Functions (2/2)Gradient of the “peaks” functiondz/dx = -6*(1-x)*exp(-x2-(y+1)2) - 6*(1-x)2*x*exp(-x2-(y+1)2) - 10*(1/5-3*x2)*exp(-x2-y2) + 20*(1/5*x-x3-y5)*x*exp(-x2-y2) - 1/3*(-2*x-2)*exp(-(x+1

4、)2-y2)dz/dy = 3*(1-x)2*(-2*y-2)*exp(-x2-(y+1)2) + 50*y4*exp(-x2-y2) + 20*(1/5*x-x3-y5)*y*exp(-x2-y2) + 2/3*y*exp(-(x+1)2-y2)d(dz/dx)/dx = 36*x*exp(-x2-(y+1)2) - 18*x2*exp(-x2-(y+1)2) - 24*x3*exp(-x2-(y+1)2) + 12*x4*exp(-x2-(y+1)2) + 72*x*exp(-x2-y2) - 148*x3*exp(-x2-y2) - 20*y5*exp(-x2-y2) + 40*x5*e

5、xp(-x2-y2) + 40*x2*exp(-x2-y2)*y5 -2/3*exp(-(x+1)2-y2) - 4/3*exp(-(x+1)2-y2)*x2 -8/3*exp(-(x+1)2-y2)*x Each point/region with zero gradient has a basin of attractionBasin of Attraction in 2DRosenbrock FunctionRosenbrock functionMore about this functionAnimation: /?p=gradient.descentDocument on how t

6、o optimize this functionJustification for using momentum termsProperties of Gradient DescentPropertiesNo guarantee for reaching global optimumFeasible for differentiable objective functions (which can have finite number of non-differential points)Performance heavily dependent on starting point and s

7、tep size VariantsUse adaptive step sizesNormalize the gradient by its lengthUse the momentum term to reduce zig-zag pathsUse line minimization at each iterationQuiz!Comparisons of Gradient-based OptimizationGradient descent (GD)Treat all parameters as nonlinearHybrid learning of GD+LSEDistinguish be

8、tween linear and nonlinearConjugate gradient descentTry to reach the minimizing point by assume the objective function is quadraticGauss-Newton (GN) methodLinearize the objective function to treat all parameters as linearLevenberg-Marquardt (LM) methodSwitch smoothly between SD and GNSynonymsLineari

9、zation methodExtended Kalman filter methodConcept:General nonlinear model: y = f(x, q)linearization at q = qnow: y = f(x, qnow)+a1(q1 - q1,now)+a2(q2 - q2,now) + .LSE solution: qnext = qnow + h(ATA)-1ATBGauss-Newton MethodFormulaqnext = qnow + h(ATA+lI)-1ATBEffects of ll small Gauss-Newton methodl big Gradient descentHow to update lGreedy policy Make l smallCautious policy Make l bigLevenberg-Marquardt MethodCan we use GD to find the minimum of f(x)=|x|?What is the gradient of the sigmoid function? Can you express the gradient using the original funct

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论