学习视频课件RNN training_第1页
学习视频课件RNN training_第2页
学习视频课件RNN training_第3页
学习视频课件RNN training_第4页
学习视频课件RNN training_第5页
已阅读5页,还剩51页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

TrainingRecurrentNeuralNetworkHung-yiLeeTrainingRecurrentNeuralNetw1Goal

x1x2x3y1y2y3WiWhWoinit……WhWhWiWoWiWo0

Backpropagationthroughtime(BPTT)Goal

x1x2x3y1y2y3WiWhWoinit…2Review:

BackpropagationForwardPassBackwardPass…………LayerLayerErrorsignalReview:

BackpropagationForward3Review:

Backpropagation…LayerL…LayerL-1……………………BackwardPassErrorsignalReview:

Backpropagation…Layer4anBackpropagationthroughTime

xnynAverydeepneuralnetworkoutput:yn

Input:init,x1,x2,…xn

UNFOLD:xn-1an-1xn-2……x1a1init

an-2……anBackpropagationthroughTime5anBackpropagationthroughTime

xnynAverydeepneuralnetworkoutput:yn

Input:init,x1,x2,…xn

UNFOLD:xn-1an-1xn-2……x1a1init

an-2…………anBackpropagationthroughTime6anBackpropagationthroughTime

xnynAverydeepneuralnetworkoutput:yn

Input:init,x1,x2,…xn

UNFOLD:xn-1an-1xn-2……x1a1init

an-2

anBackpropagationthroughTime7an

xnynxn-1an-1xn-2……x1a1Someweightsareshared.initan-2jikikjijikj

BackpropagationthroughTimeAverydeepneuralnetworkoutput:yn

Input:init,x1,x2,…xn

UNFOLD:Initializew1,

w2

bythesamevalue(Thevaluesofw1,

w2

shouldalwaysbethesame.)thesamememorypointerpointer

an

xnynxn-1an-1xn-2……x1a1Some8BPTT

x1x2x3y1y2y3init

x4y4BackwardPass:

ForwardPass:Computea1,a2,a3,a4……a1a2a3a4BPTT

x1x2x3y1y2y3init

x4y49Unfortunately,itisnoteasytotrainRNN.Unfortunately,itisnoteasy10Theerrorsurfaceisrough.w1w2CostSource:/proceedings/papers/v28/pascanu13.pdfTheerrorsurfaceiseitherveryflatorverysteep.Theerrorsurfaceisrough.w1w11Ifn=1000:ToyExample101wy101wy201wy301wyn……CostCn

1111Ifn=1000:ToyExample101wy1012n=10n=100n=1000Onlyextremelylargeandsmallvalue

n=10n=100n=1000Onlyextremely13anBackpropagationthroughTime

xnynxn-1an-1xn-2……x1a1init

an-2GradientVanishing/Exploding

Forsimplicity,assumelinearactivationfunction

……anBackpropagationthroughTime14GradientVanishing/Exploding

1step2steps5steps10steps20steps50stepsGradientVanishing/Exploding115PossibleSolutionsPossibleSolutions16ClippedGradientw1w2CostClippedgradienttheano.tensor.clip(x,

min,

max)

Source:/proceedings/papers/v28/pascanu13.pdfClippedGradientw1w2CostClipp17Source:/~fritz/absps/momentum.pdfGradientdescentMomentumNesterov’sAcceleratedGradient(NAG)NAGMethods:ValleySource:http://www.cs.toronto.18NAGMomentumNesterov’sAcceleratedGradient(NAG)MovementGradientLastMovementGradient=0Gradient=0NAGMomentumNesterov’sAccelera19RMSProp

LargerLearningRateSmallerLearningRateReview:Adagrad

UsefirstderivativetoestimatesecondderivativeRMSProp

LargerLearningRateS20RMSProp

ErrorSurfacecanbeevenmorecomplexwhentrainingRNN.LargerLearningRateSmallerLearningRateRMSProp

ErrorSurfacecanbe21RMSProp

RootMeanSquareofthegradientswithpreviousgradientsbeingdecayedRMSProp

RootMeanSqua22x1x2++++++++Input

4timesofparametersLSTMcanaddressthegradientvanishingproblem.x1x2++++++++Input

4timesof23LSTMxtzzi

zfzo

yt

xt+1zzi

zfzo

yt+1ht+1Extension:“peephole”ht

LSTMxtzzi

zfzo

yt

xt+1zzi

z24LSTMxtzzi

zfzo

yt

xt+1zzi

zfzo

yt+1ht+1ht

abδδδabδ

11XWTXWTConstantErrorCarrousel(CEC)LSTMxtzzi

zfzo

yt

xt+1zzi

z25OtherSimplerVariantsGRU:Cho,Kyunghyun,etal."LearningPhraseRepresentationsusingRNNEncoder–DecoderforStatisticalMachineTranslation“,EMNLP,2014SCRN:Mikolov,Tomas,etal."Learninglongermemoryinrecurrentneuralnetworks“,ICLR2015OtherSimplerVariantsGRU:Cho26BetterInitializationVanilla

RNN:InitializedwithIdentitymatrix+ReLUQuocV.Le,

NavdeepJaitly,

GeoffreyE.Hinton,“ASimpleWaytoInitializeRecurrentNetworksofRectifiedLinearUnits“,2015BetterInitializationVanillaR27ConcludingRemarksBecarefulwhentrainingRNN…Possiblesolutions:ClippingthegradientsAdvancedoptimizationtechnologyNAGRMSpropTryLSTM(orothersimplervariants)BetterinitializationConcludingRemarksBecarefulw28TrainingRecurrentNeuralNetworkHung-yiLeeTrainingRecurrentNeuralNetw29Goal

x1x2x3y1y2y3WiWhWoinit……WhWhWiWoWiWo0

Backpropagationthroughtime(BPTT)Goal

x1x2x3y1y2y3WiWhWoinit…30Review:

BackpropagationForwardPassBackwardPass…………LayerLayerErrorsignalReview:

BackpropagationForward31Review:

Backpropagation…LayerL…LayerL-1……………………BackwardPassErrorsignalReview:

Backpropagation…Layer32anBackpropagationthroughTime

xnynAverydeepneuralnetworkoutput:yn

Input:init,x1,x2,…xn

UNFOLD:xn-1an-1xn-2……x1a1init

an-2……anBackpropagationthroughTime33anBackpropagationthroughTime

xnynAverydeepneuralnetworkoutput:yn

Input:init,x1,x2,…xn

UNFOLD:xn-1an-1xn-2……x1a1init

an-2…………anBackpropagationthroughTime34anBackpropagationthroughTime

xnynAverydeepneuralnetworkoutput:yn

Input:init,x1,x2,…xn

UNFOLD:xn-1an-1xn-2……x1a1init

an-2

anBackpropagationthroughTime35an

xnynxn-1an-1xn-2……x1a1Someweightsareshared.initan-2jikikjijikj

BackpropagationthroughTimeAverydeepneuralnetworkoutput:yn

Input:init,x1,x2,…xn

UNFOLD:Initializew1,

w2

bythesamevalue(Thevaluesofw1,

w2

shouldalwaysbethesame.)thesamememorypointerpointer

an

xnynxn-1an-1xn-2……x1a1Some36BPTT

x1x2x3y1y2y3init

x4y4BackwardPass:

ForwardPass:Computea1,a2,a3,a4……a1a2a3a4BPTT

x1x2x3y1y2y3init

x4y437Unfortunately,itisnoteasytotrainRNN.Unfortunately,itisnoteasy38Theerrorsurfaceisrough.w1w2CostSource:/proceedings/papers/v28/pascanu13.pdfTheerrorsurfaceiseitherveryflatorverysteep.Theerrorsurfaceisrough.w1w39Ifn=1000:ToyExample101wy101wy201wy301wyn……CostCn

1111Ifn=1000:ToyExample101wy1040n=10n=100n=1000Onlyextremelylargeandsmallvalue

n=10n=100n=1000Onlyextremely41anBackpropagationthroughTime

xnynxn-1an-1xn-2……x1a1init

an-2GradientVanishing/Exploding

Forsimplicity,assumelinearactivationfunction

……anBackpropagationthroughTime42GradientVanishing/Exploding

1step2steps5steps10steps20steps50stepsGradientVanishing/Exploding143PossibleSolutionsPossibleSolutions44ClippedGradientw1w2CostClippedgradienttheano.tensor.clip(x,

min,

max)

Source:/proceedings/papers/v28/pascanu13.pdfClippedGradientw1w2CostClipp45Source:/~fritz/absps/momentum.pdfGradientdescentMomentumNesterov’sAcceleratedGradient(NAG)NAGMethods:ValleySource:http://www.cs.toronto.46NAGMomentumNesterov’sAcceleratedGradient(NAG)MovementGradientLastMovementGradient=0Gradient=0NAGMomentumNesterov’sAccelera47RMSProp

LargerLearningRateSmallerLearningRateReview:Adagrad

UsefirstderivativetoestimatesecondderivativeRMSProp

LargerLearningRateS48RMSProp

ErrorSurfacecanbeevenmorecomplexwhentrainingRNN.LargerLearningRateSmallerLearningRateRMSProp

ErrorSurfacecanbe49RMSProp

RootMeanSquareofthegradientswithpreviousgradientsbeingdecayedRMSProp

RootMeanSqua50x1x2++++++++Input

4timesofparametersLSTMcanaddressthegradientvanishingproblem.x1x2++++++++Input

4timesof51LSTMxtzzi

zfzo

yt

xt+1zzi

zfzo

yt+1ht+1Extension:“peephole”ht

LSTMxtzzi

zfzo

yt

xt+1zzi

z52LSTMxtzzi

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论