斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第1页
斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第2页
斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第3页
斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第4页
斯坦福深度学习自然语言处理课件 -cs224n-2021-lecture07-nmt_第5页
已阅读5页,还剩158页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

NaturalLanguageProcessing

withDeepLearning

CS224N/Ling284

ChristopherManning

Lecture7:MachineTranslation,Sequence-to-SequenceandAttention

LecturePlan

Todaywewill:

1.Introduceanewtask:MachineTranslation[15mins],whichisamajoruse-caseof2.Anewneuralarchitecture:sequence-to-sequence[45mins],whichisimprovedby

3.Anewneuraltechnique:attention[20mins]

.Announcements

.Assignment3isduetoday–Ihopeyourdependencyparsersareparsingtext!

.Assignment4outtoday–coveredinthislecture,youget9daysforit(!),dueThu

.Getstartedearly!It’sbiggerandharderthanthepreviousassignments

.Thursday’slectureaboutchoosingfinalprojects

2

Section1:Pre-NeuralMachineTranslation

3

MachineTranslation

MachineTranslation(MT)isthetaskoftranslatingasentencexfromonelanguage(thesourcelanguage)toasentenceyinanotherlanguage(thetargetlanguage).

x:L'hommeestnélibre,etpartoutilestdanslesfers

y:Manisbornfree,buteverywhereheisinchains

-Rousseau

4

TheearlyhistoryofMT:1950s

.Machinetranslationresearchbeganintheearly1950sonmachineslesspowerfulthanhighschoolcalculators

.Foundationalworkonautomata,formallanguages,probabilities,and

informationtheory

.MTheavilyfundedbymilitary,butbasicallyjustsimplerule-basedsystemsdoingwordsubstitution

.Humanlanguageismorecomplicatedthanthat,andvariesmoreacross

languages!

.Littleunderstandingofnaturallanguagesyntax,semantics,pragmatics.Problemsoonappearedintractable

1minutevideoshowing1954MT:

https://youtu.be/K-HfpsHPmvw

LanguageModel

Modelshowtowrite

goodEnglish(fluency).

Learntfrommonolingualdata.

6

1990s-2010s:StatisticalMachineTranslation

.Coreidea:Learnaprobabilisticmodelfromdata

.Supposewe’retranslatingFrench→English.

.WewanttofindbestEnglishsentencey,givenFrenchsentencex

argmaxp(ylz)

.UseBayesRuletobreakthisdownintotwocomponentstobelearnedseparately:

=argmaxyp(ly)p(y)

TranslationModel

Modelshowwordsandphrases

shouldbetranslated(fidelity).Learntfromparalleldata.

1990s-2010s:StatisticalMachineTranslation

.Question:Howtolearntranslationmodel?

.First,needlargeamountofparalleldata

(e.g.,pairsofhuman-translatedFrench/Englishsentences)

7

TheRosettaStone

AncientEgyptian

Demotic

AncientGreek

LearningalignmentforSMT

.Question:Howtolearntranslationmodelfromtheparallelcorpus?

.Breakitdownfurther:Introducelatentavariableintothemodel:

whereaisthealignment,i.e.word-levelcorrespondencebetweensourcesentencex

andtargetsentencey

8

Whatisalignment?

Alignmentisthecorrespondencebetweenparticularwordsinthetranslatedsentencepair.

.Typologicaldifferencesbetweenlanguagesleadtocomplicatedalignments!

.Note:Somewordshavenocounterpart

9Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.

/anthology/J93-2003

Alignmentiscomplex

Alignmentcanbemany-to-one

10Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.

/anthology/J93-2003

Alignmentiscomplex

Alignmentcanbeone-to-many

11Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.

/anthology/J93-2003

Alignmentiscomplex

Alignmentcanbemany-to-many(phrase-level)

12Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.

/anthology/J93-2003

LearningalignmentforSMT

.Welearnasacombinationofmanyfactors,including:

.Probabilityofparticularwordsaligning(alsodependsonpositioninsent)

.Probabilityofparticularwordshavingaparticularfertility(numberofcorrespondingwords)

.etc.

.Alignmentsaarelatentvariables:Theyaren’texplicitlyspecifiedinthedata!

.Requiretheuseofspeciallearningalgorithms(likeExpectation-Maximization)forlearningtheparametersofdistributionswithlatentvariables

.Inolderdays,weusedtodoalotofthatinCS224N,butnowseeCS228!

13

DecodingforSMT

Question:

Howtocompute

thisargmax?

LanguageModel

TranslationModel

.Wecouldenumerateeverypossibleyandcalculatetheprobability?→Tooexpensive!

.Answer:Imposestrongindependenceassumptionsinmodel,usedynamicprogrammingforgloballyoptimalsolutions(e.g.Viterbialgorithm).

.Thisprocessiscalleddecoding

14

goes

home

are

it

he

DecodingforSMT

ergehtjanichtnachhause

heisyesnotafterhouse

itareisdonottohome

,itgoes,ofcoursedoesnotaccordingtochamber

,hego,isnotinathome

itisnothome

hewillbeisnotunderhouse

itgoesdoesnotreturnhome

hegoesdonotdonot

isto

arefollowing

isafterallnotafter

doesnotto

not

isnot

arenot

isnota

yes

doesnot

go

home

to

Source:”StatisticalMachineTranslation",Chapter6,Koehn,2009.

/core/books/statistical-machine-translation/94EADF9F680558E13BE759997553CDE5

15

1990s-2010s:StatisticalMachineTranslation

.SMTwasahugeresearchfield

.Thebestsystemswereextremelycomplex

.Hundredsofimportantdetailswehaven’tmentionedhere

.Systemshadmanyseparately-designedsubcomponents

.Lotsoffeatureengineering

.Needtodesignfeaturestocaptureparticularlanguagephenomena

.Requirecompilingandmaintainingextraresources

.Liketablesofequivalentphrases

.Lotsofhumanefforttomaintain

.Repeatedeffortforeachlanguagepair!

16

Section2:NeuralMachineTranslation

17

18

2014

(dramaticreenactment)

19

MTresearch

2014

(dramaticreenactment)

WhatisNeuralMachineTranslation?

.NeuralMachineTranslation(NMT)isawaytodoMachineTranslationwithasingle

end-to-endneuralnetwork

.Theneuralnetworkarchitectureiscalledasequence-to-sequencemodel(akaseq2seq)anditinvolvestwoRNNs

20

EncoderRNN

argmax

argmax

argmax

argmax

argmax

argmax

argmax

DecoderRNN

Sourcesentence(input)

EncoderRNNproduces

anencodingofthe

sourcesentence.

21

NeuralMachineTranslation(NMT)

Thesequence-to-sequencemodel

Targetsentence(output)

Encodingofthesourcesentence.

Providesinitialhiddenstate

apie<END>

hit

he

me

with

forDecoderRNN.

}

ilam’entarté<START>hehitmewithapie

DecoderRNNisaLanguageModelthatgenerates

targetsentence,conditionedonencoding.

Note:Thisdiagramshowstesttimebehavior:decoderoutputisfedinasnextstep’sinput

Sequence-to-sequenceisversatile!

.Sequence-to-sequenceisusefulformorethanjustMT

.ManyNLPtaskscanbephrasedassequence-to-sequence:

.Summarization(longtext→shorttext)

.Dialogue(previousutterances→nextutterance)

.Parsing(inputtext→outputparseassequence)

.Codegeneration(naturallanguage→Pythoncode)

22

NeuralMachineTranslation(NMT)

.Thesequence-to-sequencemodelisanexampleofaConditionalLanguageModel

.LanguageModelbecausethedecoderispredictingthe

nextwordofthetargetsentencey

.Conditionalbecauseitspredictionsarealsoconditionedonthesourcesentencex

.NMTdirectlycalculates:

Probabilityofnexttargetword,given

targetwordssofarandsourcesentencex

.Question:HowtotrainaNMTsystem?

.Answer:Getabigparallelcorpus…

23

EncoderRNN

{

DecoderRNN

TrainingaNeuralMachineTranslationsystem

=negativelog=negativelog=negativelog

Tprobof“he”

Jt=+J2

probof“with”probof<END>

1

J4

J7

J1

T

J=

+J3++J5+J6+

1234567

ilam’entarté<START>hehitmewithapie

}

Targetsentence(fromcorpus)

Sourcesentence(fromcorpus)

Seq2seqisoptimizedasasinglesystem.Backpropagationoperates“end-to-end”.

24

Multi-layerRNNs

.RNNsarealready“deep”ononedimension(theyunrollovermanytimesteps)

.Wecanalsomakethem“deep”inanotherdimensionbyapplyingmultipleRNNs

–thisisamulti-layerRNN.

.Thisallowsthenetworktocomputemorecomplexrepresentations

.ThelowerRNNsshouldcomputelower-levelfeaturesandthehigherRNNsshould

computehigher-levelfeatures.

.Multi-layerRNNsarealsocalledstackedRNNs.

25

theweekend<EOS>

escalatedover

The

protests

0.3

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.5

0.1

-0.1

0.6

-0.1

-0.7

0.1

-0.4

0.6

-0.1

-0.7

0.1

-0.2

0.6

-0.1

-0.7

0.1

-0.3

0.5

-0.1

-0.7

0.1

0.4

0.4

0.3

-0.2

-0.3

0.4

0.4

-0.1

-0.7

0.1

0.5

0.5

0.9

-0.3

-0.2

0.1

0.3

0.1

-0.4

0.2

0.2

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

0.3

0.1

Encoder:

Buildsup

sentence

meaning

-0.4

0.6

-0.1

-0.7

0.1

0.2

-0.8

-0.1

-0.5

0.1

0.2

0.6

-0.1

-0.7

0.1

0.1

0.3

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.4

0.1

-0.1

0.6

-0.1

0.3

0.1

0.2

0.4

-0.1

0.2

0.1

0.3

0.6

-0.1

-0.5

0.1

0.2

0.6

-0.1

-0.7

0.1

Decoder

0.2

0.6

-0.1

-0.7

0.1

0.2

-0.2

-0.1

0.1

0.1

0.2

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.7

0.1

0.4

-0.2

-0.3

-0.4

-0.2

0.2

0.4

0.1

-0.5

-0.2

0.2

-0.3

-0.1

-0.4

0.2

-0.1

0.3

-0.1

-0.7

0.1

-0.2

0.6

0.1

0.3

0.1

0.2

0.6

-0.1

-0.7

0.1

-0.4

0.5

-0.5

0.4

0.1

0.4

-0.6

0.2

-0.3

0.4

0.2

0.6

-0.1

-0.7

0.1

Source

sentence

protestsescalated

overtheweekend

amWochenendeeskaliert<EOS>The

DieProtestewaren

Multi-layerdeepencoder-decodermachinetranslationnet

[Sutskeveretal.2014;Luongetal.2015]

ThehiddenstatesfromRNNlayeri

aretheinputstoRNNlayeri+1

Translationgenerated

0.2

0.6

-0.1

-0.7

0.1

0.2

-0.1

-0.1

-0.7

0.1

0.2

0.6

-0.1

-0.7

0.1

Feedinginlastword

Bottleneck

Conditioning=

26

Multi-layerRNNsinpractice

.High-performingRNNsareusuallymulti-layer(butaren’tasdeepasconvolutionalorfeed-forwardnetworks)

.Forexample:Ina2017paper,Britzetal.findthatforNeuralMachineTranslation,2to4layersisbestfortheencoderRNN,and4layersisbestforthedecoderRNN

.Often2layersisalotbetterthan1,and3mightbealittlebetterthan2

.Usually,skip-connections/dense-connectionsareneededtotraindeeperRNNs(e.g.,8layers)

.Transformer-basednetworks(e.g.,BERT)areusuallydeeper,like12or24layers.

.YouwilllearnaboutTransformerslater;theyhavealotofskipping-likeconnections

“MassiveExplorationofNeuralMachineTranslationArchitecutres”,Britzetal,2017.

/pdf/1703.03906.pdf

27

argmax

argmax

argmax

argmax

argmax

argmax

argmax

Greedydecoding

.Wesawhowtogenerate(or“decode”)thetargetsentencebytakingargmaxoneach

stepofthedecoder

hehitmewithapie<END>

<START>hehitmewithapie

.Thisisgreedydecoding(takemostprobablewordoneachstep)

.Problemswiththismethod?

28

Problemswithgreedydecoding

.Greedydecodinghasnowaytoundodecisions!

.Input:ilam’entarté(hehitmewithapie)

.→he

.→hehit

.→hehita(whoops!nogoingbacknow…)

.Howtofixthis?

29

Exhaustivesearchdecoding

.Ideally,wewanttofinda(lengthT)translationythatmaximizes

.Wecouldtrycomputingallpossiblesequencesy

.Thismeansthatoneachsteptofthedecoder,we’retrackingVtpossiblepartial

translations,whereVisvocabsize

.ThisO(VT)complexityisfartooexpensive!

30

Beamsearchdecoding

.Coreidea:Oneachstepofdecoder,keeptrackofthekmostprobablepartial

translations(whichwecallhypotheses)

.kisthebeamsize(inpracticearound5to10)

.Ahypothesishasascorewhichisitslogprobability:

.Scoresareallnegative,andhigherscoreisbetter

.Wesearchforhigh-scoringhypotheses,trackingtopkoneachstep

.Beamsearchisnotguaranteedtofindoptimalsolution

.Butmuchmoreefficientthanexhaustivesearch!

31

32

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

<START>

Calculateprobdistofnextword

33

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-0.7=logPLM(he|<START>)

he

<START>

I

-0.9=logPLM(I|<START>)

Taketopkwords

andcomputescores

-0.7

he

I

-0.9

34

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-1.7=logPLM(hit|<START>he)+-0.7

hit

struck

<START>

-2.9=logPLM(struck|<START>he)+-0.7

-1.6=logPLM(was|<START>I)+-0.9

was

got

-1.8=logPLM(got|<START>I)+-0.9

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

-0.7

he

I

-0.9

35

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-1.7

hit

struck

-2.9

<START>

-1.6

was

got

-1.8

Ofthesek2hypotheses,

justkeepkwithhighestscores

hit

struck

I

36

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-2.8=logPLM(a|<START>hehit)+-1.7

-1.7

a

-0.7

hit

he

me

struck

-2.9

-1.6

<START>

-2.5=logPLM(me|<START>hehit)+-1.7

-2.9=logPLM(hit|<START>Iwas)+-1.6

was

got

-0.9-3.8=logPLM(struck|<START>Iwas)+-1.6

-1.8

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

-1.7

a

-0.7

hit

he

me

-2.5

struck

I

-0.9-3.8

37

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-2.8

struck

-2.9

-2.9

<START>

hit

-1.6

was

got

-1.8

Ofthesek2hypotheses,

justkeepkwithhighestscores

-2.8

a

-1.7

-2.5

-2.9

<START>

-2.9

-1.6

38

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0

tart

pie

hit

-0.7

-3.4

he

me

struck

-3.3

with

on

hit

was

-3.5

I

struck

got

-0.9-3.8

-1.8

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

-2.8

a

-1.7

-2.5

-2.9

<START>

-2.9

-1.6

39

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0

tart

pie

hit

-0.7

-3.4

he

me

struck

-3.3

with

on

hit

was

-3.5

I

struck

got

-0.9-3.8

-1.8

Ofthesek2hypotheses,

justkeepkwithhighestscores

tart

-2.8

pie

a

-1.7

with

-2.5

-2.9

on

<START>

-2.9

-1.6

40

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

in

with

-3.4

hit

-0.7

-4.5

he

me

-3.3

struck

-3.7

a

one

-3.5

hit

was

-4.3

I

struck

got

-0.9-3.8

-1.8

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

tart

-2.8

pie

a

-1.7

with

-2.5

-2.9

on

<START>

-2.9

-1.6

41

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

in

with

-3.4

hit

-0.7

-4.5

he

me

-3.3

struck

-3.7

a

one

-3.5

hit

was

-4.3

I

struck

got

-0.9-3.8

-1.8

Ofthesek2hypotheses,

justkeepkwithhighestscores

tart

-2.8

pie

-4.3

a

-1.7

pie

with

-2.5

-4.6

-2.9

on

<START>

-2.9

-5.0

-1.6

42

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

in

with

-3.4

hit

-0.7

-4.5

he

me

tart

-3.3

struck

-3.7

a

one

hit

-3.5

pie

was

-4.3

struck

I

tart

-5.3

got

-0.9-3.8

-1.8

Foreachofthekhypotheses,find

topknextwordsandcalculatescores

tart

-2.8

pie

-4.3

a

-1.7

pie

with

-2.5

-4.6

-2.9

on

<START>

-2.9

-5.0

-1.6

43

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

in

with

-3.4

hit

-0.7

-4.5

he

me

tart

-3.3

struck

-3.7

a

one

hit

-3.5

pie

was

-4.3

struck

I

tart

-5.3

got

-0.9-3.8

-1.8

Thisisthetop-scoringhypothesis!

tart

-2.8

pie

-4.3

a

-1.7

pie

with

-2.5

-4.6

-2.9

on

<START>

-2.9

-5.0

-1.6

44

Beamsearchdecoding:example

Beamsize=k=2.Bluenumbers=

-4.0-4.8

in

with

-3.4

hit

-0.7

-4.5

he

me

tart

-3.3

struck

-3.7

a

one

hit

-3.5

pie

was

-4.3

struck

I

tart

-5.3

got

-0.9-3.8

-1.8

Backtracktoobtainthefullhypothesis

Beamsearchdecoding:stoppingcriterion

.Ingreedydecoding,usuallywedecodeuntilthemodelproducesan<END>token

.Forexample:<START>hehitmewithapie<END>

.Inbeamsearchdecoding,differenthypothesesmayproduce<END>tokenson

differenttimesteps

.Whenahypothesisproduces<END>,thathypothesisiscomplete.

.Placeitasideandcontinueexploringotherhypothesesviabeamsearch.

.Usuallywecontinuebeamsearchuntil:

.WereachtimestepT(whereTissomepre-definedcutoff),or

.Wehaveatleastncompletedhypotheses(wherenispre-definedcutoff)

45

Beamsearchdecoding:finishingup

.Wehaveourlistofcompletedhypotheses.

.Howtoselecttoponewithhighestscore?

.Eachhypothesisonourlisthasascore

.Problemwiththis:longerhypotheseshavelowerscores

.Fix:Normalizebylength.Usethistoselecttoponeinstead:

46

AdvantagesofNMT

ComparedtoSMT,NMThasmanyadvantages:

.Betterperformance

.Morefluent

.Betteruseofcontext

.Betteruseofphrasesimilarities

.Asingleneuralnetworktobeoptimizedend-to-end

.Nosubcomponentstobeindividuallyoptimized

.Requiresmuchlesshumanengineeringeffort

.Nofeatureengineering

.Samemethodforalllanguagepairs

47

DisadvantagesofNMT?

ComparedtoSMT:

.NMTislessinterpretable

.Hardtodebug

.NMTisdifficulttocontrol

.Forexample,can’teasilyspecifyrulesorguidelinesfortranslation

.Safetyconcerns!

48

HowdoweevaluateMachineTranslation?

BLEU(BilingualEvaluationUnderstudy)You’llseeBLEUindetail

inAssignment4!

.BLEUcomparesthemachine-writtentranslationtooneorseveralhuman-writtentranslation(s),andcomputesasimilarityscorebasedon:

.n-gramprecision(usuallyfor1,2,3and4-grams)

.Plusapenaltyfortoo-shortsystemtranslations

.BLEUisusefulbutimperfect

.Therearemanyvalidwaystotranslateasentence

.SoagoodtranslationcangetapoorBLEUscorebecauseithaslown-gramoverlapwiththehumantranslationL

49Source:”BLEU:aMethodforAutomaticEvaluationofMachineTranslation",Papinenietal,2002.

/anthology/P02-1040

MTprogressovertime

[EdinburghEn-DeWMTnewstest2013CasedBLEU;NMT2015fromU.Montréal;NMT2019FAIRonnewstest2019]

45

40

35

30

25

20

15

10

5

0

50

Phrase-basedSMT

Syntax-basedSMT

NeuralMT

2013201420152016201720182019

Sources:

http://www.meta-net.eu/events/meta-forum-2016/slides/09_sennrich.pdf&/

NMT:perhapsthebiggestsuccessstoryofNLPDeepLearning?

NeuralMachineTranslationwentfromafringeresearchattemptin2014totheleading

standardmethodin2016

.2014:Firstseq2seqpaperpublished

.2016:GoogleTranslateswitchesfromSMTtoNMT–andby2018everyonehas

.Thisisamazing!

.SMTsystems,builtbyhundredsofengineersovermanyyears,outperformedby

NMTsystemstrainedbyasmallgroupofengineersinafewmonths

51

So,isMachineTranslationsolved?

.Nope!

.Manydifficultiesremain:

.Out-of-vocabularywords

.Domainmismatchbetweentrainandtestdata

.Maintainingcontextoverlongertext

.Low-resourcelanguagepairs

.Failurestoaccuratelycapturesentencemeaning

.Pronoun(orzeropronoun)resolutionerrors

.Morphologicalagreementerrors

Furtherreading:“HasAIsurpassedhumansattranslation?Notevenclose!”

/editorials/state_of_nmt

52

SoisMachineTranslationsolved?

.Nope!

.Usingcommonsenseisstillhard

?

53

SoisMachineTranslationsolved?

.Nope!

.NMTpicksupbiasesintrainingdata

Didn’tspecifygender

Source:

/bias-sexist-or-this-is-the-way-it-should-be-ce1f7c8c683c

54

SoisMachineTranslationsolved?

.Nope!

.Uninterpretablesystemsdostrangethings

.(ButIthinkthisproblemhasbeenfixedinGoogleTranslateby2021?)

55

Picturesource:

/en_uk/article/j5npeg/why-is-google-translate-spitting-out-sinister-religious-prophecies

Explanation:

/briefs/google-nmt-prophecies

NMTresearchcontinues

NMTisaflagshiptaskforNLPDeepLearning

.NMTresearchhaspioneeredmanyoftherecentinnovationsofNLPDeepLearning

.In2021:NMTresearchcontinuestothrive

.Researchershavefoundmany,manyimprovementstothe“vanilla”seq2seqNMT

systemwe’vejustpresented

.Butwe’llpresentinaminuteoneimprovementsointegralthatitisthenewvanilla…

ATTENTION

56

Assignment4:Cherokee-Englishmachinetranslation!

.CherokeeisanendangeredNativeAmericanlanguage–about2000fluentspeakers.Extremelylowresource:About20kparallelsentencesavailable,mostfromthebible

.ᎪᎯᎩᏴᏥᎨᏒᎢᎦᎵᏉᎩᎢᏯᏂᎢᎠᏂᏧᏣ.ᏂᎪᎯᎸᎢᏗᎦᎳᏫᎢᏍᏗᎢᏩᏂᏯᎡᎢ

ᏓᎾᏁᎶᎲᏍᎬᎢᏅᏯᎪᏢᏔᏅᎢᎦᏆᏗᎠᏂᏐᏆᎴᎵᏙᎲᎢᎠᎴᎤᏓᏍᏈᏗᎦᎾᏍᏗᎠᏅᏗᏍᎨᎢ

ᎠᏅᏂᎲᎢ.

Longagoweresevenboyswhousedtospendalltheirtimedownbythetownhouse

playinggames,rollingastonewheelalongtheground,slidingandstrikingitwithastick

.WritingsystemisasyllabaryofsymbolsforeachCVunit(85letters)

.ManythankstoShiyueZhang,BenjaminFrey,andMohitBansal

fromUNCChapelHillfortheresourcesforthisassignment!

.CherokeeisnotavailableonGoogleTranslate!

57

Cherokee

.CherokeeoriginallylivedinwesternNorthCarolinaandeasternTennessee.MostspeakersnowinOklahoma,followingtheTrailofTears;someinNC

.WritingsystemInventedbySequoyaharound1820–someonewho

waspreviouslyilliterate

.Veryeffective:InthefollowingdecadesCherokeeliteracywashigher

thanforwhitepeopleinthesoutheasternUnitedStates

.

58

Section3:Attention

59

DecoderRNN

Sequence-to-sequence:thebottleneckproblem

EncoderRNN

60

Encodingofthe

sourcesentence.

Targetsentence(output)

hehitmewithapie<END>

ilam’entarté

<START>hehitmewithapie

Sourcesentence(input)

Problemswiththisarchitecture?

EncoderRNN

DecoderRNN

Sequence-to-sequence:thebottleneckproblem

Encodingofthe

sourcesentence.

Targetsentence(output)

Thisneedstocaptureall

informationaboutthe

hehitmewithapie<END>

sourcesentence.

Informationbottleneck!

ilam’entarté

Sourcesentence(input)

<START>hehitmewithapie

61

Attention

.Attentionprovidesasolutiontothebottleneckproblem.

.Coreidea:oneachstepofthedecoder,usedirectconnectiontotheencodertofocusonaparticularpartofthesourcesequence

.First,wewillshowviadiagram(noequations),thenwewillshowwithequations

62

EncoderAttention

RNNscores

{{

DecoderRNN

}

Sequence-to-sequencewithattention

63

dotproduct

ilam’entarté

Sourcesentence(input)

<START>

EncoderAttention

RNNscores

{{

DecoderRNN

}

Sequence-to-sequencewithattention

64

dotproduct

ilam’entarté

Sourcesentence(input)

<START>

EncoderAttention

RNNscores

{{

DecoderRNN

}

Sequence-to-sequencewithattention

65

dotproduct

ilam’entarté

Sourcesentence(input)

<START>

EncoderAttention

RNN

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论