




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
TrainingRecurrentNeuralNetworkHung-yiLeeTrainingRecurrentNeuralNetw1Goal
x1x2x3y1y2y3WiWhWoinit……WhWhWiWoWiWo0
Backpropagationthroughtime(BPTT)Goal
x1x2x3y1y2y3WiWhWoinit…2Review:
BackpropagationForwardPassBackwardPass…………LayerLayerErrorsignalReview:
BackpropagationForward3Review:
Backpropagation…LayerL…LayerL-1……………………BackwardPassErrorsignalReview:
Backpropagation…Layer4anBackpropagationthroughTime
xnynAverydeepneuralnetworkoutput:yn
Input:init,x1,x2,…xn
UNFOLD:xn-1an-1xn-2……x1a1init
an-2……anBackpropagationthroughTime5anBackpropagationthroughTime
xnynAverydeepneuralnetworkoutput:yn
Input:init,x1,x2,…xn
UNFOLD:xn-1an-1xn-2……x1a1init
an-2…………anBackpropagationthroughTime6anBackpropagationthroughTime
xnynAverydeepneuralnetworkoutput:yn
Input:init,x1,x2,…xn
UNFOLD:xn-1an-1xn-2……x1a1init
an-2
anBackpropagationthroughTime7an
xnynxn-1an-1xn-2……x1a1Someweightsareshared.initan-2jikikjijikj
BackpropagationthroughTimeAverydeepneuralnetworkoutput:yn
Input:init,x1,x2,…xn
UNFOLD:Initializew1,
w2
bythesamevalue(Thevaluesofw1,
w2
shouldalwaysbethesame.)thesamememorypointerpointer
an
xnynxn-1an-1xn-2……x1a1Some8BPTT
x1x2x3y1y2y3init
x4y4BackwardPass:
ForwardPass:Computea1,a2,a3,a4……a1a2a3a4BPTT
x1x2x3y1y2y3init
x4y49Unfortunately,itisnoteasytotrainRNN.Unfortunately,itisnoteasy10Theerrorsurfaceisrough.w1w2CostSource:/proceedings/papers/v28/pascanu13.pdfTheerrorsurfaceiseitherveryflatorverysteep.Theerrorsurfaceisrough.w1w11Ifn=1000:ToyExample101wy101wy201wy301wyn……CostCn
1111Ifn=1000:ToyExample101wy1012n=10n=100n=1000Onlyextremelylargeandsmallvalue
n=10n=100n=1000Onlyextremely13anBackpropagationthroughTime
xnynxn-1an-1xn-2……x1a1init
an-2GradientVanishing/Exploding
Forsimplicity,assumelinearactivationfunction
……anBackpropagationthroughTime14GradientVanishing/Exploding
1step2steps5steps10steps20steps50stepsGradientVanishing/Exploding115PossibleSolutionsPossibleSolutions16ClippedGradientw1w2CostClippedgradienttheano.tensor.clip(x,
min,
max)
Source:/proceedings/papers/v28/pascanu13.pdfClippedGradientw1w2CostClipp17Source:/~fritz/absps/momentum.pdfGradientdescentMomentumNesterov’sAcceleratedGradient(NAG)NAGMethods:ValleySource:http://www.cs.toronto.18NAGMomentumNesterov’sAcceleratedGradient(NAG)MovementGradientLastMovementGradient=0Gradient=0NAGMomentumNesterov’sAccelera19RMSProp
LargerLearningRateSmallerLearningRateReview:Adagrad
UsefirstderivativetoestimatesecondderivativeRMSProp
LargerLearningRateS20RMSProp
ErrorSurfacecanbeevenmorecomplexwhentrainingRNN.LargerLearningRateSmallerLearningRateRMSProp
ErrorSurfacecanbe21RMSProp
RootMeanSquareofthegradientswithpreviousgradientsbeingdecayedRMSProp
RootMeanSqua22x1x2++++++++Input
4timesofparametersLSTMcanaddressthegradientvanishingproblem.x1x2++++++++Input
4timesof23LSTMxtzzi
zfzo
yt
xt+1zzi
zfzo
yt+1ht+1Extension:“peephole”ht
LSTMxtzzi
zfzo
yt
xt+1zzi
z24LSTMxtzzi
zfzo
yt
xt+1zzi
zfzo
yt+1ht+1ht
abδδδabδ
11XWTXWTConstantErrorCarrousel(CEC)LSTMxtzzi
zfzo
yt
xt+1zzi
z25OtherSimplerVariantsGRU:Cho,Kyunghyun,etal."LearningPhraseRepresentationsusingRNNEncoder–DecoderforStatisticalMachineTranslation“,EMNLP,2014SCRN:Mikolov,Tomas,etal."Learninglongermemoryinrecurrentneuralnetworks“,ICLR2015OtherSimplerVariantsGRU:Cho26BetterInitializationVanilla
RNN:InitializedwithIdentitymatrix+ReLUQuocV.Le,
NavdeepJaitly,
GeoffreyE.Hinton,“ASimpleWaytoInitializeRecurrentNetworksofRectifiedLinearUnits“,2015BetterInitializationVanillaR27ConcludingRemarksBecarefulwhentrainingRNN…Possiblesolutions:ClippingthegradientsAdvancedoptimizationtechnologyNAGRMSpropTryLSTM(orothersimplervariants)BetterinitializationConcludingRemarksBecarefulw28TrainingRecurrentNeuralNetworkHung-yiLeeTrainingRecurrentNeuralNetw29Goal
x1x2x3y1y2y3WiWhWoinit……WhWhWiWoWiWo0
Backpropagationthroughtime(BPTT)Goal
x1x2x3y1y2y3WiWhWoinit…30Review:
BackpropagationForwardPassBackwardPass…………LayerLayerErrorsignalReview:
BackpropagationForward31Review:
Backpropagation…LayerL…LayerL-1……………………BackwardPassErrorsignalReview:
Backpropagation…Layer32anBackpropagationthroughTime
xnynAverydeepneuralnetworkoutput:yn
Input:init,x1,x2,…xn
UNFOLD:xn-1an-1xn-2……x1a1init
an-2……anBackpropagationthroughTime33anBackpropagationthroughTime
xnynAverydeepneuralnetworkoutput:yn
Input:init,x1,x2,…xn
UNFOLD:xn-1an-1xn-2……x1a1init
an-2…………anBackpropagationthroughTime34anBackpropagationthroughTime
xnynAverydeepneuralnetworkoutput:yn
Input:init,x1,x2,…xn
UNFOLD:xn-1an-1xn-2……x1a1init
an-2
anBackpropagationthroughTime35an
xnynxn-1an-1xn-2……x1a1Someweightsareshared.initan-2jikikjijikj
BackpropagationthroughTimeAverydeepneuralnetworkoutput:yn
Input:init,x1,x2,…xn
UNFOLD:Initializew1,
w2
bythesamevalue(Thevaluesofw1,
w2
shouldalwaysbethesame.)thesamememorypointerpointer
an
xnynxn-1an-1xn-2……x1a1Some36BPTT
x1x2x3y1y2y3init
x4y4BackwardPass:
ForwardPass:Computea1,a2,a3,a4……a1a2a3a4BPTT
x1x2x3y1y2y3init
x4y437Unfortunately,itisnoteasytotrainRNN.Unfortunately,itisnoteasy38Theerrorsurfaceisrough.w1w2CostSource:/proceedings/papers/v28/pascanu13.pdfTheerrorsurfaceiseitherveryflatorverysteep.Theerrorsurfaceisrough.w1w39Ifn=1000:ToyExample101wy101wy201wy301wyn……CostCn
1111Ifn=1000:ToyExample101wy1040n=10n=100n=1000Onlyextremelylargeandsmallvalue
n=10n=100n=1000Onlyextremely41anBackpropagationthroughTime
xnynxn-1an-1xn-2……x1a1init
an-2GradientVanishing/Exploding
Forsimplicity,assumelinearactivationfunction
……anBackpropagationthroughTime42GradientVanishing/Exploding
1step2steps5steps10steps20steps50stepsGradientVanishing/Exploding143PossibleSolutionsPossibleSolutions44ClippedGradientw1w2CostClippedgradienttheano.tensor.clip(x,
min,
max)
Source:/proceedings/papers/v28/pascanu13.pdfClippedGradientw1w2CostClipp45Source:/~fritz/absps/momentum.pdfGradientdescentMomentumNesterov’sAcceleratedGradient(NAG)NAGMethods:ValleySource:http://www.cs.toronto.46NAGMomentumNesterov’sAcceleratedGradient(NAG)MovementGradientLastMovementGradient=0Gradient=0NAGMomentumNesterov’sAccelera47RMSProp
LargerLearningRateSmallerLearningRateReview:Adagrad
UsefirstderivativetoestimatesecondderivativeRMSProp
LargerLearningRateS48RMSProp
ErrorSurfacecanbeevenmorecomplexwhentrainingRNN.LargerLearningRateSmallerLearningRateRMSProp
ErrorSurfacecanbe49RMSProp
RootMeanSquareofthegradientswithpreviousgradientsbeingdecayedRMSProp
RootMeanSqua50x1x2++++++++Input
4timesofparametersLSTMcanaddressthegradientvanishingproblem.x1x2++++++++Input
4timesof51LSTMxtzzi
zfzo
yt
xt+1zzi
zfzo
yt+1ht+1Extension:“peephole”ht
LSTMxtzzi
zfzo
yt
xt+1zzi
z52LSTMxtzzi
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025至2030年中国精密电子螺丝市场分析及竞争策略研究报告001
- 2025至2030年中国精密垫片市场分析及竞争策略研究报告
- 2025至2030年中国粉末回收喷房行业投资前景及策略咨询报告
- 2025至2030年中国箱包塑料配件行业发展研究报告
- 2025至2030年中国筒形过滤器市场分析及竞争策略研究报告
- 2025至2030年中国立体图案花纹辊行业发展研究报告
- 2025至2030年中国离心直管行业发展研究报告
- 2025至2030年中国眼线液瓶行业投资前景及策略咨询研究报告
- 会计行业中的多文化沟通计划
- 建立有效反馈机制的工作计划
- 管道冲洗吹扫清洗记录
- DB32T 4073-2021 建筑施工承插型盘扣式钢管支架安全技术规程
- 徐士良《计算机软件技术基础》(第4版)笔记和课后习题详解
- 广播式自动相关监视(ADS-B)ADS-B课件
- (新教材)教科版二年级上册科学 1.2 土壤 动植物的乐园 教学课件
- 粗大运动功能评估量表
- 新云智能化管理系统运行管理标准
- 技术咨询合同-碳核查
- 毕业设计(论文)-多功能平板道路清障车设计(拖拽车)
- 电学难题总复习初中物理电学六大专题解析
- 《诊疗六步》
评论
0/150
提交评论