




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、GradientDirectional DerivativesGradient descent (GD): AKA steepest descent (SD)Goal: Minimize a function iteratively based on gradientFormula for GD:Normalized versionWith momentumGradient Descent (GD)Step size or learning rateQuiz!Vanilla GDorExample of Single-Input FunctionsIf n=1, GD reduces to t
2、he problem of going left or right.ExampleAnimation:/?p=gradient.descentEach point/region with zero gradient has a basin of attractionBasin of Attraction in 1DExample of Two-input Functions“Peaks” Functions (1/2)If n=2, GD needs to find a direction in 2D plane.Example: “Peaks” function in MATLABAnima
3、tion: gradientDescentDemo.mGradients is perpendicularto contours, why?3 local maxima3 local minima“Peaks” Functions (2/2)Gradient of the “peaks” functiondz/dx = -6*(1-x)*exp(-x2-(y+1)2) - 6*(1-x)2*x*exp(-x2-(y+1)2) - 10*(1/5-3*x2)*exp(-x2-y2) + 20*(1/5*x-x3-y5)*x*exp(-x2-y2) - 1/3*(-2*x-2)*exp(-(x+1
4、)2-y2)dz/dy = 3*(1-x)2*(-2*y-2)*exp(-x2-(y+1)2) + 50*y4*exp(-x2-y2) + 20*(1/5*x-x3-y5)*y*exp(-x2-y2) + 2/3*y*exp(-(x+1)2-y2)d(dz/dx)/dx = 36*x*exp(-x2-(y+1)2) - 18*x2*exp(-x2-(y+1)2) - 24*x3*exp(-x2-(y+1)2) + 12*x4*exp(-x2-(y+1)2) + 72*x*exp(-x2-y2) - 148*x3*exp(-x2-y2) - 20*y5*exp(-x2-y2) + 40*x5*e
5、xp(-x2-y2) + 40*x2*exp(-x2-y2)*y5 -2/3*exp(-(x+1)2-y2) - 4/3*exp(-(x+1)2-y2)*x2 -8/3*exp(-(x+1)2-y2)*x Each point/region with zero gradient has a basin of attractionBasin of Attraction in 2DRosenbrock FunctionRosenbrock functionMore about this functionAnimation: /?p=gradient.descentDocument on how t
6、o optimize this functionJustification for using momentum termsProperties of Gradient DescentPropertiesNo guarantee for reaching global optimumFeasible for differentiable objective functions (which can have finite number of non-differential points)Performance heavily dependent on starting point and s
7、tep size VariantsUse adaptive step sizesNormalize the gradient by its lengthUse the momentum term to reduce zig-zag pathsUse line minimization at each iterationQuiz!Comparisons of Gradient-based OptimizationGradient descent (GD)Treat all parameters as nonlinearHybrid learning of GD+LSEDistinguish be
8、tween linear and nonlinearConjugate gradient descentTry to reach the minimizing point by assume the objective function is quadraticGauss-Newton (GN) methodLinearize the objective function to treat all parameters as linearLevenberg-Marquardt (LM) methodSwitch smoothly between SD and GNSynonymsLineari
9、zation methodExtended Kalman filter methodConcept:General nonlinear model: y = f(x, q)linearization at q = qnow: y = f(x, qnow)+a1(q1 - q1,now)+a2(q2 - q2,now) + .LSE solution: qnext = qnow + h(ATA)-1ATBGauss-Newton MethodFormulaqnext = qnow + h(ATA+lI)-1ATBEffects of ll small Gauss-Newton methodl big Gradient descentHow to update lGreedy policy Make l smallCautious policy Make l bigLevenberg-Marquardt MethodCan we use GD to find the minimum of f(x)=|x|?What is the gradient of the sigmoid function? Can you express the gradient using the original funct
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 纵隔占位影像诊断
- 工厂承包货柜方案简单
- 原料检验面试题及答案
- 医院感染病例报告制度与流程
- 脑卒中康复面试题及答案
- 货物类投标技术方案
- 首都机场考试题库及答案
- 机构对外宣传方案模板
- 小儿结核病护理
- 酒店培训内容课件
- 企业法务管理及风险防范措施
- 七年级英语下册单词表2025
- 2023-2024学年湖南省娄底一中七年级(下)期中数学试卷 (含解析)
- JJF 1183-2025温度变送器校准规范
- 光伏 安装合同范本
- 上海黄浦老西门项目概念方案设计(260P)
- 金融服务不良体验投诉书范文
- 碳排放与财务绩效-深度研究
- 食品加工安全生产标准化建设流程
- 2025年上海闵行区高三一模高考英语模拟试卷(含答案详解)
- 2025年湖北十堰市竹山县事业单位招聘工作人员89人高频重点提升(共500题)附带答案详解
评论
0/150
提交评论