斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt

上传人：策*** IP属地：山西上传时间：2023-07-18 格式：DOCX 页数：163 大小：4.02MB 积分：19.9 举报 版权申诉

斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第2页

斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第3页

斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第4页

斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第5页

已阅读5页，还剩158页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

NaturalLanguageProcessing

withDeepLearning

CS224N/Ling284

ChristopherManning

Lecture7:MachineTranslation,Sequence-to-SequenceandAttention

LecturePlan

Todaywewill:

1.Introduceanewtask:MachineTranslation[15mins],whichisamajoruse-caseof2.Anewneuralarchitecture:sequence-to-sequence[45mins],whichisimprovedby

3.Anewneuraltechnique:attention[20mins]

.Announcements

.Assignment3isduetoday–Ihopeyourdependencyparsersareparsingtext!

.Assignment4outtoday–coveredinthislecture,youget9daysforit(!),dueThu

.Getstartedearly!It’sbiggerandharderthanthepreviousassignments

.Thursday’slectureaboutchoosingfinalprojects

Section1:Pre-NeuralMachineTranslation

MachineTranslation

MachineTranslation(MT)isthetaskoftranslatingasentencexfromonelanguage(thesourcelanguage)toasentenceyinanotherlanguage(thetargetlanguage).

x:L'hommeestnélibre,etpartoutilestdanslesfers

y:Manisbornfree,buteverywhereheisinchains

-Rousseau

TheearlyhistoryofMT:1950s

.Machinetranslationresearchbeganintheearly1950sonmachineslesspowerfulthanhighschoolcalculators

.Foundationalworkonautomata,formallanguages,probabilities,and

informationtheory

.MTheavilyfundedbymilitary,butbasicallyjustsimplerule-basedsystemsdoingwordsubstitution

.Humanlanguageismorecomplicatedthanthat,andvariesmoreacross

languages!

.Littleunderstandingofnaturallanguagesyntax,semantics,pragmatics.Problemsoonappearedintractable

1minutevideoshowing1954MT:

https://youtu.be/K-HfpsHPmvw

LanguageModel

Modelshowtowrite

goodEnglish(fluency).

Learntfrommonolingualdata.

1990s-2010s:StatisticalMachineTranslation

.Coreidea:Learnaprobabilisticmodelfromdata

.Supposewe’retranslatingFrench→English.

.WewanttofindbestEnglishsentencey,givenFrenchsentencex

argmaxp(ylz)

.UseBayesRuletobreakthisdownintotwocomponentstobelearnedseparately:

=argmaxyp(ly)p(y)

TranslationModel

Modelshowwordsandphrases

shouldbetranslated(fidelity).Learntfromparalleldata.

1990s-2010s:StatisticalMachineTranslation

.Question:Howtolearntranslationmodel?

.First,needlargeamountofparalleldata

(e.g.,pairsofhuman-translatedFrench/Englishsentences)

TheRosettaStone

AncientEgyptian

Demotic

AncientGreek

LearningalignmentforSMT

.Question:Howtolearntranslationmodelfromtheparallelcorpus?

.Breakitdownfurther:Introducelatentavariableintothemodel:

whereaisthealignment,i.e.word-levelcorrespondencebetweensourcesentencex

andtargetsentencey

Whatisalignment?

Alignmentisthecorrespondencebetweenparticularwordsinthetranslatedsentencepair.

.Typologicaldifferencesbetweenlanguagesleadtocomplicatedalignments!

.Note:Somewordshavenocounterpart

9Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.

/anthology/J93-2003

Alignmentiscomplex

Alignmentcanbemany-to-one

10Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.

/anthology/J93-2003

Alignmentiscomplex

Alignmentcanbeone-to-many

11Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.

/anthology/J93-2003

Alignmentiscomplex

Alignmentcanbemany-to-many(phrase-level)

12Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.

/anthology/J93-2003

LearningalignmentforSMT

.Welearnasacombinationofmanyfactors,including:

.Probabilityofparticularwordsaligning(alsodependsonpositioninsent)

.Probabilityofparticularwordshavingaparticularfertility(numberofcorrespondingwords)

.etc.

.Alignmentsaarelatentvariables:Theyaren’texplicitlyspecifiedinthedata!

.Requiretheuseofspeciallearningalgorithms(likeExpectation-Maximization)forlearningtheparametersofdistributionswithlatentvariables

.Inolderdays,weusedtodoalotofthatinCS224N,butnowseeCS228!

DecodingforSMT

Question:

Howtocompute

thisargmax?

LanguageModel

TranslationModel

.Wecouldenumerateeverypossibleyandcalculatetheprobability?→Tooexpensive!

.Answer:Imposestrongindependenceassumptionsinmodel,usedynamicprogrammingforgloballyoptimalsolutions(e.g.Viterbialgorithm).

.Thisprocessiscalleddecoding

goes

home

are

DecodingforSMT

ergehtjanichtnachhause

heisyesnotafterhouse

itareisdonottohome

,itgoes,ofcoursedoesnotaccordingtochamber

,hego,isnotinathome

itisnothome

hewillbeisnotunderhouse

itgoesdoesnotreturnhome

hegoesdonotdonot

isto

arefollowing

isafterallnotafter

doesnotto

not

isnot

arenot

isnota

yes

doesnot

home

Source:”StatisticalMachineTranslation",Chapter6,Koehn,2009.

/core/books/statistical-machine-translation/94EADF9F680558E13BE759997553CDE5

1990s-2010s:StatisticalMachineTranslation

.SMTwasahugeresearchfield

.Thebestsystemswereextremelycomplex

.Hundredsofimportantdetailswehaven’tmentionedhere

.Systemshadmanyseparately-designedsubcomponents

.Lotsoffeatureengineering

.Needtodesignfeaturestocaptureparticularlanguagephenomena

.Requirecompilingandmaintainingextraresources

.Liketablesofequivalentphrases

.Lotsofhumanefforttomaintain

.Repeatedeffortforeachlanguagepair!

Section2:NeuralMachineTranslation

2014

(dramaticreenactment)

MTresearch

2014

(dramaticreenactment)

WhatisNeuralMachineTranslation?

.NeuralMachineTranslation(NMT)isawaytodoMachineTranslationwithasingle

end-to-endneuralnetwork

.Theneuralnetworkarchitectureiscalledasequence-to-sequencemodel(akaseq2seq)anditinvolvestwoRNNs

EncoderRNN

argmax

DecoderRNN

Sourcesentence(input)

EncoderRNNproduces

anencodingofthe

sourcesentence.

NeuralMachineTranslation(NMT)

Thesequence-to-sequencemodel

Targetsentence(output)

Encodingofthesourcesentence.

Providesinitialhiddenstate

apie<END>

hit

with

forDecoderRNN.

}

ilam’entarté<START>hehitmewithapie

DecoderRNNisaLanguageModelthatgenerates

targetsentence,conditionedonencoding.

Note:Thisdiagramshowstesttimebehavior:decoderoutputisfedinasnextstep’sinput

Sequence-to-sequenceisversatile!

.Sequence-to-sequenceisusefulformorethanjustMT

.ManyNLPtaskscanbephrasedassequence-to-sequence:

.Summarization(longtext→shorttext)

.Dialogue(previousutterances→nextutterance)

.Parsing(inputtext→outputparseassequence)

.Codegeneration(naturallanguage→Pythoncode)

NeuralMachineTranslation(NMT)

.Thesequence-to-sequencemodelisanexampleofaConditionalLanguageModel

.LanguageModelbecausethedecoderispredictingthe

nextwordofthetargetsentencey

.Conditionalbecauseitspredictionsarealsoconditionedonthesourcesentencex

.NMTdirectlycalculates:

Probabilityofnexttargetword,given

targetwordssofarandsourcesentencex

.Question:HowtotrainaNMTsystem?

.Answer:Getabigparallelcorpus…

EncoderRNN

{

DecoderRNN

TrainingaNeuralMachineTranslationsystem

=negativelog=negativelog=negativelog

Tprobof“he”

Jt=+J2

probof“with”probof<END>

+J3++J5+J6+

1234567

ilam’entarté<START>hehitmewithapie

}

Targetsentence(fromcorpus)

Sourcesentence(fromcorpus)

Seq2seqisoptimizedasasinglesystem.Backpropagationoperates“end-to-end”.

Multi-layerRNNs

.RNNsarealready“deep”ononedimension(theyunrollovermanytimesteps)

.Wecanalsomakethem“deep”inanotherdimensionbyapplyingmultipleRNNs

–thisisamulti-layerRNN.

.Thisallowsthenetworktocomputemorecomplexrepresentations

.ThelowerRNNsshouldcomputelower-levelfeaturesandthehigherRNNsshould

computehigher-levelfeatures.

.Multi-layerRNNsarealsocalledstackedRNNs.

theweekend<EOS>

escalatedover

The

protests

0.3

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.5

0.1

-0.1

0.6

-0.1

-0.7

0.1

-0.4

0.6

-0.1

-0.7

0.1

-0.2

0.6

-0.1

-0.7

0.1

-0.3

0.5

-0.1

-0.7

0.1

0.4

0.3

-0.2

-0.3

0.4

-0.1

-0.7

0.1

0.5

0.9

-0.3

-0.2

0.1

0.3

0.1

-0.4

0.2

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

0.3

0.1

Encoder:

Buildsup

sentence

meaning

-0.4

0.6

-0.1

-0.7

0.1

0.2

-0.8

-0.1

-0.5

0.1

0.2

0.6

-0.1

-0.7

0.1

0.3

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.4

0.1

-0.1

0.6

-0.1

0.3

0.1

0.2

0.4

-0.1

0.2

0.1

0.3

0.6

-0.1

-0.5

0.1

0.2

0.6

-0.1

-0.7

0.1

Decoder

0.2

0.6

-0.1

-0.7

0.1

0.2

-0.2

-0.1

0.1

0.2

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.7

0.1

0.4

-0.2

-0.3

-0.4

-0.2

0.2

0.4

0.1

-0.5

-0.2

0.2

-0.3

-0.1

-0.4

0.2

-0.1

0.3

-0.1

-0.7

0.1

-0.2

0.6

0.1

0.3

0.1

0.2

0.6

-0.1

-0.7

0.1

-0.4

0.5

-0.5

0.4

0.1

0.4

-0.6

0.2

-0.3

0.4

0.2

0.6

-0.1

-0.7

0.1

Source

sentence

protestsescalated

overtheweekend

amWochenendeeskaliert<EOS>The

DieProtestewaren

Multi-layerdeepencoder-decodermachinetranslationnet

[Sutskeveretal.2014;Luongetal.2015]

ThehiddenstatesfromRNNlayeri

aretheinputstoRNNlayeri+1

Translationgenerated

0.2

0.6

-0.1

-0.7

0.1

0.2

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.7

0.1

Feedinginlastword

Bottleneck

Conditioning=

Multi-layerRNNsinpractice

.High-performingRNNsareusuallymulti-layer(butaren’tasdeepasconvolutionalorfeed-forwardnetworks)

.Forexample:Ina2017paper,Britzetal.findthatforNeuralMachineTranslation,2to4layersisbestfortheencoderRNN,and4layersisbestforthedecoderRNN

.Often2layersisalotbetterthan1,and3mightbealittlebetterthan2

.Usually,skip-connections/dense-connectionsareneededtotraindeeperRNNs(e.g.,8layers)

.Transformer-basednetworks(e.g.,BERT)areusuallydeeper,like12or24layers.

.YouwilllearnaboutTransformerslater;theyhavealotofskipping-likeconnections

“MassiveExplorationofNeuralMachineTranslationArchitecutres”,Britzetal,2017.

/pdf/1703.03906.pdf

argmax

Greedydecoding

.Wesawhowtogenerate(or“decode”)thetargetsentencebytakingargmaxoneach

stepofthedecoder

hehitmewithapie<END>

<START>hehitmewithapie

.Thisisgreedydecoding(takemostprobablewordoneachstep)

.Problemswiththismethod?

Problemswithgreedydecoding

.Greedydecodinghasnowaytoundodecisions!

.Input:ilam’entarté(hehitmewithapie)

.→he

.→hehit

.→hehita(whoops!nogoingbacknow…)

.Howtofixthis?

Exhaustivesearchdecoding

.Ideally,wewanttofinda(lengthT)translationythatmaximizes

.Wecouldtrycomputingallpossiblesequencesy

.Thismeansthatoneachsteptofthedecoder,we’retrackingVtpossiblepartial

translations,whereVisvocabsize

.ThisO(VT)complexityisfartooexpensive!

Beamsearchdecoding

.Coreidea:Oneachstepofdecoder,keeptrackofthekmostprobablepartial

translations(whichwecallhypotheses)

.kisthebeamsize(inpracticearound5to10)

.Ahypothesishasascorewhichisitslogprobability:

.Scoresareallnegative,andhigherscoreisbetter

.Wesearchforhigh-scoringhypotheses,trackingtopkoneachstep

.Beamsearchisnotguaranteedtofindoptimalsolution

.Butmuchmoreefficientthanexhaustivesearch!

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

<START>

Calculateprobdistofnextword

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-0.7=logPLM(he|<START>)

<START>

-0.9=logPLM(I|<START>)

Taketopkwords

andcomputescores

-0.7

-0.9

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-1.7=logPLM(hit|<START>he)+-0.7

hit

struck

<START>

-2.9=logPLM(struck|<START>he)+-0.7

-1.6=logPLM(was|<START>I)+-0.9

was

got

-1.8=logPLM(got|<START>I)+-0.9

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

-0.7

-0.9

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-1.7

hit

struck

-2.9

<START>

-1.6

was

got

-1.8

Ofthesek2hypotheses,

justkeepkwithhighestscores

hit

struck

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-2.8=logPLM(a|<START>hehit)+-1.7

-1.7

-0.7

hit

struck

-2.9

-1.6

<START>

-2.5=logPLM(me|<START>hehit)+-1.7

-2.9=logPLM(hit|<START>Iwas)+-1.6

was

got

-0.9-3.8=logPLM(struck|<START>Iwas)+-1.6

-1.8

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

-1.7

-0.7

hit

-2.5

struck

-0.9-3.8

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-2.8

struck

-2.9

<START>

hit

-1.6

was

got

-1.8

Ofthesek2hypotheses,

justkeepkwithhighestscores

-2.8

-1.7

-2.5

-2.9

<START>

-2.9

-1.6

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0

tart

pie

hit

-0.7

-3.4

struck

-3.3

with

hit

was

-3.5

struck

got

-0.9-3.8

-1.8

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

-2.8

-1.7

-2.5

-2.9

<START>

-2.9

-1.6

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0

tart

pie

hit

-0.7

-3.4

struck

-3.3

with

hit

was

-3.5

struck

got

-0.9-3.8

-1.8

Ofthesek2hypotheses,

justkeepkwithhighestscores

tart

-2.8

pie

-1.7

with

-2.5

-2.9

<START>

-2.9

-1.6

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

with

-3.4

hit

-0.7

-4.5

-3.3

struck

-3.7

one

-3.5

hit

was

-4.3

struck

got

-0.9-3.8

-1.8

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

tart

-2.8

pie

-1.7

with

-2.5

-2.9

<START>

-2.9

-1.6

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

with

-3.4

hit

-0.7

-4.5

-3.3

struck

-3.7

one

-3.5

hit

was

-4.3

struck

got

-0.9-3.8

-1.8

Ofthesek2hypotheses,

justkeepkwithhighestscores

tart

-2.8

pie

-4.3

-1.7

pie

with

-2.5

-4.6

-2.9

<START>

-2.9

-5.0

-1.6

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

with

-3.4

hit

-0.7

-4.5

tart

-3.3

struck

-3.7

one

hit

-3.5

pie

was

-4.3

struck

tart

-5.3

got

-0.9-3.8

-1.8

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

tart

-2.8

pie

-4.3

-1.7

pie

with

-2.5

-4.6

-2.9

<START>

-2.9

-5.0

-1.6

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

with

-3.4

hit

-0.7

-4.5

tart

-3.3

struck

-3.7

one

hit

-3.5

pie

was

-4.3

struck

tart

-5.3

got

-0.9-3.8

-1.8

Thisisthetop-scoringhypothesis!

tart

-2.8

pie

-4.3

-1.7

pie

with

-2.5

-4.6

-2.9

<START>

-2.9

-5.0

-1.6

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

with

-3.4

hit

-0.7

-4.5

tart

-3.3

struck

-3.7

one

hit

-3.5

pie

was

-4.3

struck

tart

-5.3

got

-0.9-3.8

-1.8

Backtracktoobtainthefullhypothesis

Beamsearchdecoding:stoppingcriterion

.Ingreedydecoding,usuallywedecodeuntilthemodelproducesan<END>token

.Forexample:<START>hehitmewithapie<END>

.Inbeamsearchdecoding,differenthypothesesmayproduce<END>tokenson

differenttimesteps

.Whenahypothesisproduces<END>,thathypothesisiscomplete.

.Placeitasideandcontinueexploringotherhypothesesviabeamsearch.

.Usuallywecontinuebeamsearchuntil:

.WereachtimestepT(whereTissomepre-definedcutoff),or

.Wehaveatleastncompletedhypotheses(wherenispre-definedcutoff)

Beamsearchdecoding:finishingup

.Wehaveourlistofcompletedhypotheses.

.Howtoselecttoponewithhighestscore?

.Eachhypothesisonourlisthasascore

.Problemwiththis:longerhypotheseshavelowerscores

.Fix:Normalizebylength.Usethistoselecttoponeinstead:

AdvantagesofNMT

ComparedtoSMT,NMThasmanyadvantages:

.Betterperformance

.Morefluent

.Betteruseofcontext

.Betteruseofphrasesimilarities

.Asingleneuralnetworktobeoptimizedend-to-end

.Nosubcomponentstobeindividuallyoptimized

.Requiresmuchlesshumanengineeringeffort

.Nofeatureengineering

.Samemethodforalllanguagepairs

DisadvantagesofNMT?

ComparedtoSMT:

.NMTislessinterpretable

.Hardtodebug

.NMTisdifficulttocontrol

.Forexample,can’teasilyspecifyrulesorguidelinesfortranslation

.Safetyconcerns!

HowdoweevaluateMachineTranslation?

BLEU(BilingualEvaluationUnderstudy)You’llseeBLEUindetail

inAssignment4!

.BLEUcomparesthemachine-writtentranslationtooneorseveralhuman-writtentranslation(s),andcomputesasimilarityscorebasedon:

.n-gramprecision(usuallyfor1,2,3and4-grams)

.Plusapenaltyfortoo-shortsystemtranslations

.BLEUisusefulbutimperfect

.Therearemanyvalidwaystotranslateasentence

.SoagoodtranslationcangetapoorBLEUscorebecauseithaslown-gramoverlapwiththehumantranslationL

49Source:”BLEU:aMethodforAutomaticEvaluationofMachineTranslation",Papinenietal,2002.

/anthology/P02-1040

MTprogressovertime

[EdinburghEn-DeWMTnewstest2013CasedBLEU;NMT2015fromU.Montréal;NMT2019FAIRonnewstest2019]

Phrase-basedSMT

Syntax-basedSMT

NeuralMT

2013201420152016201720182019

Sources:

http://www.meta-net.eu/events/meta-forum-2016/slides/09_sennrich.pdf&/

NMT:perhapsthebiggestsuccessstoryofNLPDeepLearning?

NeuralMachineTranslationwentfromafringeresearchattemptin2014totheleading

standardmethodin2016

.2014:Firstseq2seqpaperpublished

.2016:GoogleTranslateswitchesfromSMTtoNMT–andby2018everyonehas

.Thisisamazing!

.SMTsystems,builtbyhundredsofengineersovermanyyears,outperformedby

NMTsystemstrainedbyasmallgroupofengineersinafewmonths

So,isMachineTranslationsolved?

.Nope!

.Manydifficultiesremain:

.Out-of-vocabularywords

.Domainmismatchbetweentrainandtestdata

.Maintainingcontextoverlongertext

.Low-resourcelanguagepairs

.Failurestoaccuratelycapturesentencemeaning

.Pronoun(orzeropronoun)resolutionerrors

.Morphologicalagreementerrors

Furtherreading:“HasAIsurpassedhumansattranslation?Notevenclose!”

/editorials/state_of_nmt

SoisMachineTranslationsolved?

.Nope!

.Usingcommonsenseisstillhard

SoisMachineTranslationsolved?

.Nope!

.NMTpicksupbiasesintrainingdata

Didn’tspecifygender

Source:

/bias-sexist-or-this-is-the-way-it-should-be-ce1f7c8c683c

SoisMachineTranslationsolved?

.Nope!

.Uninterpretablesystemsdostrangethings

.(ButIthinkthisproblemhasbeenfixedinGoogleTranslateby2021?)

Picturesource:

/en_uk/article/j5npeg/why-is-google-translate-spitting-out-sinister-religious-prophecies

Explanation:

/briefs/google-nmt-prophecies

NMTresearchcontinues

NMTisaflagshiptaskforNLPDeepLearning

.NMTresearchhaspioneeredmanyoftherecentinnovationsofNLPDeepLearning

.In2021:NMTresearchcontinuestothrive

.Researchershavefoundmany,manyimprovementstothe“vanilla”seq2seqNMT

systemwe’vejustpresented

.Butwe’llpresentinaminuteoneimprovementsointegralthatitisthenewvanilla…

ATTENTION

Assignment4:Cherokee-Englishmachinetranslation!

.CherokeeisanendangeredNativeAmericanlanguage–about2000fluentspeakers.Extremelylowresource:About20kparallelsentencesavailable,mostfromthebible

.ᎪᎯᎩᏴᏥᎨᏒᎢᎦᎵᏉᎩᎢᏯᏂᎢᎠᏂᏧᏣ.ᏂᎪᎯᎸᎢᏗᎦᎳᏫᎢᏍᏗᎢᏩᏂᏯᎡᎢ

ᏓᎾᏁᎶᎲᏍᎬᎢᏅᏯᎪᏢᏔᏅᎢᎦᏆᏗᎠᏂᏐᏆᎴᎵᏙᎲᎢᎠᎴᎤᏓᏍᏈᏗᎦᎾᏍᏗᎠᏅᏗᏍᎨᎢ

ᎠᏅᏂᎲᎢ.

Longagoweresevenboyswhousedtospendalltheirtimedownbythetownhouse

playinggames,rollingastonewheelalongtheground,slidingandstrikingitwithastick

.WritingsystemisasyllabaryofsymbolsforeachCVunit(85letters)

.ManythankstoShiyueZhang,BenjaminFrey,andMohitBansal

fromUNCChapelHillfortheresourcesforthisassignment!

.CherokeeisnotavailableonGoogleTranslate!

Cherokee

.CherokeeoriginallylivedinwesternNorthCarolinaandeasternTennessee.MostspeakersnowinOklahoma,followingtheTrailofTears;someinNC

.WritingsystemInventedbySequoyaharound1820–someonewho

waspreviouslyilliterate

.Veryeffective:InthefollowingdecadesCherokeeliteracywashigher

thanforwhitepeopleinthesoutheasternUnitedStates

Section3:Attention

DecoderRNN

Sequence-to-sequence:thebottleneckproblem

EncoderRNN

Encodingofthe

sourcesentence.

Targetsentence(output)

hehitmewithapie<END>

ilam’entarté

<START>hehitmewithapie

Sourcesentence(input)

Problemswiththisarchitecture?

EncoderRNN

DecoderRNN

Sequence-to-sequence:thebottleneckproblem

Encodingofthe

sourcesentence.

Targetsentence(output)

Thisneedstocaptureall

informationaboutthe

hehitmewithapie<END>

sourcesentence.

Informationbottleneck!

ilam’entarté

Sourcesentence(input)

<START>hehitmewithapie

Attention

.Attentionprovidesasolutiontothebottleneckproblem.

.Coreidea:oneachstepofthedecoder,usedirectconnectiontotheencodertofocusonaparticularpartofthesourcesequence

.First,wewillshowviadiagram(noequations),thenwewillshowwithequations

EncoderAttention

RNNscores

{{

DecoderRNN

}

Sequence-to-sequencewithattention

dotproduct

ilam’entarté

Sourcesentence(input)

<START>

EncoderAttention

RNNscores

{{

DecoderRNN

}

Sequence-to-sequencewithattention

dotproduct

ilam’entarté

Sourcesentence(input)

<START>

EncoderAttention

RNNscores

{{

DecoderRNN

}

Sequence-to-sequencewithattention

dotproduct

ilam’entarté

Sourcesentence(input)

<START>

EncoderAttention

RNN

人人文库> 全部分类> 行业资料 > 管理策划

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt

文档简介

温馨提示

最新文档

评论

斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt

文档简介

温馨提示

最新文档

评论

相关文档