




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
NaturalLanguageProcessing
withDeepLearning
CS224N/Ling284
ChristopherManning
Lecture7:MachineTranslation,Sequence-to-SequenceandAttention
LecturePlan
Todaywewill:
1.Introduceanewtask:MachineTranslation[15mins],whichisamajoruse-caseof2.Anewneuralarchitecture:sequence-to-sequence[45mins],whichisimprovedby
3.Anewneuraltechnique:attention[20mins]
.Announcements
.Assignment3isduetoday–Ihopeyourdependencyparsersareparsingtext!
.Assignment4outtoday–coveredinthislecture,youget9daysforit(!),dueThu
.Getstartedearly!It’sbiggerandharderthanthepreviousassignments
.Thursday’slectureaboutchoosingfinalprojects
2
Section1:Pre-NeuralMachineTranslation
3
MachineTranslation
MachineTranslation(MT)isthetaskoftranslatingasentencexfromonelanguage(thesourcelanguage)toasentenceyinanotherlanguage(thetargetlanguage).
x:L'hommeestnélibre,etpartoutilestdanslesfers
y:Manisbornfree,buteverywhereheisinchains
-Rousseau
4
TheearlyhistoryofMT:1950s
.Machinetranslationresearchbeganintheearly1950sonmachineslesspowerfulthanhighschoolcalculators
.Foundationalworkonautomata,formallanguages,probabilities,and
informationtheory
.MTheavilyfundedbymilitary,butbasicallyjustsimplerule-basedsystemsdoingwordsubstitution
.Humanlanguageismorecomplicatedthanthat,andvariesmoreacross
languages!
.Littleunderstandingofnaturallanguagesyntax,semantics,pragmatics.Problemsoonappearedintractable
1minutevideoshowing1954MT:
https://youtu.be/K-HfpsHPmvw
LanguageModel
Modelshowtowrite
goodEnglish(fluency).
Learntfrommonolingualdata.
6
1990s-2010s:StatisticalMachineTranslation
.Coreidea:Learnaprobabilisticmodelfromdata
.Supposewe’retranslatingFrench→English.
.WewanttofindbestEnglishsentencey,givenFrenchsentencex
argmaxp(ylz)
.UseBayesRuletobreakthisdownintotwocomponentstobelearnedseparately:
=argmaxyp(ly)p(y)
TranslationModel
Modelshowwordsandphrases
shouldbetranslated(fidelity).Learntfromparalleldata.
1990s-2010s:StatisticalMachineTranslation
.Question:Howtolearntranslationmodel?
.First,needlargeamountofparalleldata
(e.g.,pairsofhuman-translatedFrench/Englishsentences)
7
TheRosettaStone
AncientEgyptian
Demotic
AncientGreek
LearningalignmentforSMT
.Question:Howtolearntranslationmodelfromtheparallelcorpus?
.Breakitdownfurther:Introducelatentavariableintothemodel:
whereaisthealignment,i.e.word-levelcorrespondencebetweensourcesentencex
andtargetsentencey
8
Whatisalignment?
Alignmentisthecorrespondencebetweenparticularwordsinthetranslatedsentencepair.
.Typologicaldifferencesbetweenlanguagesleadtocomplicatedalignments!
.Note:Somewordshavenocounterpart
9Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.
/anthology/J93-2003
Alignmentiscomplex
Alignmentcanbemany-to-one
10Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.
/anthology/J93-2003
Alignmentiscomplex
Alignmentcanbeone-to-many
11Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.
/anthology/J93-2003
Alignmentiscomplex
Alignmentcanbemany-to-many(phrase-level)
12Examplesfrom:“TheMathematicsofStatisticalMachineTranslation:ParameterEstimation",Brownetal,1993.
/anthology/J93-2003
LearningalignmentforSMT
.Welearnasacombinationofmanyfactors,including:
.Probabilityofparticularwordsaligning(alsodependsonpositioninsent)
.Probabilityofparticularwordshavingaparticularfertility(numberofcorrespondingwords)
.etc.
.Alignmentsaarelatentvariables:Theyaren’texplicitlyspecifiedinthedata!
.Requiretheuseofspeciallearningalgorithms(likeExpectation-Maximization)forlearningtheparametersofdistributionswithlatentvariables
.Inolderdays,weusedtodoalotofthatinCS224N,butnowseeCS228!
13
DecodingforSMT
Question:
Howtocompute
thisargmax?
LanguageModel
TranslationModel
.Wecouldenumerateeverypossibleyandcalculatetheprobability?→Tooexpensive!
.Answer:Imposestrongindependenceassumptionsinmodel,usedynamicprogrammingforgloballyoptimalsolutions(e.g.Viterbialgorithm).
.Thisprocessiscalleddecoding
14
goes
home
are
it
he
DecodingforSMT
ergehtjanichtnachhause
heisyesnotafterhouse
itareisdonottohome
,itgoes,ofcoursedoesnotaccordingtochamber
,hego,isnotinathome
itisnothome
hewillbeisnotunderhouse
itgoesdoesnotreturnhome
hegoesdonotdonot
isto
arefollowing
isafterallnotafter
doesnotto
not
isnot
arenot
isnota
yes
doesnot
go
home
to
Source:”StatisticalMachineTranslation",Chapter6,Koehn,2009.
/core/books/statistical-machine-translation/94EADF9F680558E13BE759997553CDE5
15
1990s-2010s:StatisticalMachineTranslation
.SMTwasahugeresearchfield
.Thebestsystemswereextremelycomplex
.Hundredsofimportantdetailswehaven’tmentionedhere
.Systemshadmanyseparately-designedsubcomponents
.Lotsoffeatureengineering
.Needtodesignfeaturestocaptureparticularlanguagephenomena
.Requirecompilingandmaintainingextraresources
.Liketablesofequivalentphrases
.Lotsofhumanefforttomaintain
.Repeatedeffortforeachlanguagepair!
16
Section2:NeuralMachineTranslation
17
18
2014
(dramaticreenactment)
19
MTresearch
2014
(dramaticreenactment)
WhatisNeuralMachineTranslation?
.NeuralMachineTranslation(NMT)isawaytodoMachineTranslationwithasingle
end-to-endneuralnetwork
.Theneuralnetworkarchitectureiscalledasequence-to-sequencemodel(akaseq2seq)anditinvolvestwoRNNs
20
EncoderRNN
argmax
argmax
argmax
argmax
argmax
argmax
argmax
DecoderRNN
Sourcesentence(input)
EncoderRNNproduces
anencodingofthe
sourcesentence.
21
NeuralMachineTranslation(NMT)
Thesequence-to-sequencemodel
Targetsentence(output)
Encodingofthesourcesentence.
Providesinitialhiddenstate
apie<END>
hit
he
me
with
forDecoderRNN.
}
ilam’entarté<START>hehitmewithapie
DecoderRNNisaLanguageModelthatgenerates
targetsentence,conditionedonencoding.
Note:Thisdiagramshowstesttimebehavior:decoderoutputisfedinasnextstep’sinput
Sequence-to-sequenceisversatile!
.Sequence-to-sequenceisusefulformorethanjustMT
.ManyNLPtaskscanbephrasedassequence-to-sequence:
.Summarization(longtext→shorttext)
.Dialogue(previousutterances→nextutterance)
.Parsing(inputtext→outputparseassequence)
.Codegeneration(naturallanguage→Pythoncode)
22
NeuralMachineTranslation(NMT)
.Thesequence-to-sequencemodelisanexampleofaConditionalLanguageModel
.LanguageModelbecausethedecoderispredictingthe
nextwordofthetargetsentencey
.Conditionalbecauseitspredictionsarealsoconditionedonthesourcesentencex
.NMTdirectlycalculates:
Probabilityofnexttargetword,given
targetwordssofarandsourcesentencex
.Question:HowtotrainaNMTsystem?
.Answer:Getabigparallelcorpus…
23
EncoderRNN
{
DecoderRNN
TrainingaNeuralMachineTranslationsystem
=negativelog=negativelog=negativelog
Tprobof“he”
Jt=+J2
probof“with”probof<END>
1
J4
J7
J1
T
J=
+J3++J5+J6+
1234567
ilam’entarté<START>hehitmewithapie
}
Targetsentence(fromcorpus)
Sourcesentence(fromcorpus)
Seq2seqisoptimizedasasinglesystem.Backpropagationoperates“end-to-end”.
24
Multi-layerRNNs
.RNNsarealready“deep”ononedimension(theyunrollovermanytimesteps)
.Wecanalsomakethem“deep”inanotherdimensionbyapplyingmultipleRNNs
–thisisamulti-layerRNN.
.Thisallowsthenetworktocomputemorecomplexrepresentations
.ThelowerRNNsshouldcomputelower-levelfeaturesandthehigherRNNsshould
computehigher-levelfeatures.
.Multi-layerRNNsarealsocalledstackedRNNs.
25
theweekend<EOS>
escalatedover
The
protests
0.3
0.6
-0.1
-0.7
0.1
0.2
0.6
-0.1
-0.7
0.1
0.2
0.6
-0.1
-0.5
0.1
-0.1
0.6
-0.1
-0.7
0.1
-0.4
0.6
-0.1
-0.7
0.1
-0.2
0.6
-0.1
-0.7
0.1
-0.3
0.5
-0.1
-0.7
0.1
0.4
0.4
0.3
-0.2
-0.3
0.4
0.4
-0.1
-0.7
0.1
0.5
0.5
0.9
-0.3
-0.2
0.1
0.3
0.1
-0.4
0.2
0.2
0.6
-0.1
-0.7
0.1
0.2
0.6
-0.1
0.3
0.1
Encoder:
Buildsup
sentence
meaning
-0.4
0.6
-0.1
-0.7
0.1
0.2
-0.8
-0.1
-0.5
0.1
0.2
0.6
-0.1
-0.7
0.1
0.1
0.3
-0.1
-0.7
0.1
0.2
0.6
-0.1
-0.4
0.1
-0.1
0.6
-0.1
0.3
0.1
0.2
0.4
-0.1
0.2
0.1
0.3
0.6
-0.1
-0.5
0.1
0.2
0.6
-0.1
-0.7
0.1
Decoder
0.2
0.6
-0.1
-0.7
0.1
0.2
-0.2
-0.1
0.1
0.1
0.2
0.6
-0.1
-0.7
0.1
0.2
0.6
-0.1
-0.7
0.1
0.2
0.6
-0.1
-0.7
0.1
0.4
-0.2
-0.3
-0.4
-0.2
0.2
0.4
0.1
-0.5
-0.2
0.2
-0.3
-0.1
-0.4
0.2
-0.1
0.3
-0.1
-0.7
0.1
-0.2
0.6
0.1
0.3
0.1
0.2
0.6
-0.1
-0.7
0.1
-0.4
0.5
-0.5
0.4
0.1
0.4
-0.6
0.2
-0.3
0.4
0.2
0.6
-0.1
-0.7
0.1
Source
sentence
protestsescalated
overtheweekend
amWochenendeeskaliert<EOS>The
DieProtestewaren
Multi-layerdeepencoder-decodermachinetranslationnet
[Sutskeveretal.2014;Luongetal.2015]
ThehiddenstatesfromRNNlayeri
aretheinputstoRNNlayeri+1
Translationgenerated
0.2
0.6
-0.1
-0.7
0.1
0.2
-0.1
-0.1
-0.7
0.1
0.2
0.6
-0.1
-0.7
0.1
Feedinginlastword
Bottleneck
Conditioning=
26
Multi-layerRNNsinpractice
.High-performingRNNsareusuallymulti-layer(butaren’tasdeepasconvolutionalorfeed-forwardnetworks)
.Forexample:Ina2017paper,Britzetal.findthatforNeuralMachineTranslation,2to4layersisbestfortheencoderRNN,and4layersisbestforthedecoderRNN
.Often2layersisalotbetterthan1,and3mightbealittlebetterthan2
.Usually,skip-connections/dense-connectionsareneededtotraindeeperRNNs(e.g.,8layers)
.Transformer-basednetworks(e.g.,BERT)areusuallydeeper,like12or24layers.
.YouwilllearnaboutTransformerslater;theyhavealotofskipping-likeconnections
“MassiveExplorationofNeuralMachineTranslationArchitecutres”,Britzetal,2017.
/pdf/1703.03906.pdf
27
argmax
argmax
argmax
argmax
argmax
argmax
argmax
Greedydecoding
.Wesawhowtogenerate(or“decode”)thetargetsentencebytakingargmaxoneach
stepofthedecoder
hehitmewithapie<END>
<START>hehitmewithapie
.Thisisgreedydecoding(takemostprobablewordoneachstep)
.Problemswiththismethod?
28
Problemswithgreedydecoding
.Greedydecodinghasnowaytoundodecisions!
.Input:ilam’entarté(hehitmewithapie)
.→he
.→hehit
.→hehita(whoops!nogoingbacknow…)
.Howtofixthis?
29
Exhaustivesearchdecoding
.Ideally,wewanttofinda(lengthT)translationythatmaximizes
.Wecouldtrycomputingallpossiblesequencesy
.Thismeansthatoneachsteptofthedecoder,we’retrackingVtpossiblepartial
translations,whereVisvocabsize
.ThisO(VT)complexityisfartooexpensive!
30
Beamsearchdecoding
.Coreidea:Oneachstepofdecoder,keeptrackofthekmostprobablepartial
translations(whichwecallhypotheses)
.kisthebeamsize(inpracticearound5to10)
.Ahypothesishasascorewhichisitslogprobability:
.Scoresareallnegative,andhigherscoreisbetter
.Wesearchforhigh-scoringhypotheses,trackingtopkoneachstep
.Beamsearchisnotguaranteedtofindoptimalsolution
.Butmuchmoreefficientthanexhaustivesearch!
31
32
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
<START>
Calculateprobdistofnextword
33
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-0.7=logPLM(he|<START>)
he
<START>
I
-0.9=logPLM(I|<START>)
Taketopkwords
andcomputescores
-0.7
he
I
-0.9
34
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-1.7=logPLM(hit|<START>he)+-0.7
hit
struck
<START>
-2.9=logPLM(struck|<START>he)+-0.7
-1.6=logPLM(was|<START>I)+-0.9
was
got
-1.8=logPLM(got|<START>I)+-0.9
Foreachofthekhypotheses,find
topknextwordsandcalculatescores
-0.7
he
I
-0.9
35
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-1.7
hit
struck
-2.9
<START>
-1.6
was
got
-1.8
Ofthesek2hypotheses,
justkeepkwithhighestscores
hit
struck
I
36
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-2.8=logPLM(a|<START>hehit)+-1.7
-1.7
a
-0.7
hit
he
me
struck
-2.9
-1.6
<START>
-2.5=logPLM(me|<START>hehit)+-1.7
-2.9=logPLM(hit|<START>Iwas)+-1.6
was
got
-0.9-3.8=logPLM(struck|<START>Iwas)+-1.6
-1.8
Foreachofthekhypotheses,find
topknextwordsandcalculatescores
-1.7
a
-0.7
hit
he
me
-2.5
struck
I
-0.9-3.8
37
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-2.8
struck
-2.9
-2.9
<START>
hit
-1.6
was
got
-1.8
Ofthesek2hypotheses,
justkeepkwithhighestscores
-2.8
a
-1.7
-2.5
-2.9
<START>
-2.9
-1.6
38
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-4.0
tart
pie
hit
-0.7
-3.4
he
me
struck
-3.3
with
on
hit
was
-3.5
I
struck
got
-0.9-3.8
-1.8
Foreachofthekhypotheses,find
topknextwordsandcalculatescores
-2.8
a
-1.7
-2.5
-2.9
<START>
-2.9
-1.6
39
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-4.0
tart
pie
hit
-0.7
-3.4
he
me
struck
-3.3
with
on
hit
was
-3.5
I
struck
got
-0.9-3.8
-1.8
Ofthesek2hypotheses,
justkeepkwithhighestscores
tart
-2.8
pie
a
-1.7
with
-2.5
-2.9
on
<START>
-2.9
-1.6
40
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-4.0-4.8
in
with
-3.4
hit
-0.7
-4.5
he
me
-3.3
struck
-3.7
a
one
-3.5
hit
was
-4.3
I
struck
got
-0.9-3.8
-1.8
Foreachofthekhypotheses,find
topknextwordsandcalculatescores
tart
-2.8
pie
a
-1.7
with
-2.5
-2.9
on
<START>
-2.9
-1.6
41
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-4.0-4.8
in
with
-3.4
hit
-0.7
-4.5
he
me
-3.3
struck
-3.7
a
one
-3.5
hit
was
-4.3
I
struck
got
-0.9-3.8
-1.8
Ofthesek2hypotheses,
justkeepkwithhighestscores
tart
-2.8
pie
-4.3
a
-1.7
pie
with
-2.5
-4.6
-2.9
on
<START>
-2.9
-5.0
-1.6
42
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-4.0-4.8
in
with
-3.4
hit
-0.7
-4.5
he
me
tart
-3.3
struck
-3.7
a
one
hit
-3.5
pie
was
-4.3
struck
I
tart
-5.3
got
-0.9-3.8
-1.8
Foreachofthekhypotheses,find
topknextwordsandcalculatescores
tart
-2.8
pie
-4.3
a
-1.7
pie
with
-2.5
-4.6
-2.9
on
<START>
-2.9
-5.0
-1.6
43
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-4.0-4.8
in
with
-3.4
hit
-0.7
-4.5
he
me
tart
-3.3
struck
-3.7
a
one
hit
-3.5
pie
was
-4.3
struck
I
tart
-5.3
got
-0.9-3.8
-1.8
Thisisthetop-scoringhypothesis!
tart
-2.8
pie
-4.3
a
-1.7
pie
with
-2.5
-4.6
-2.9
on
<START>
-2.9
-5.0
-1.6
44
Beamsearchdecoding:example
Beamsize=k=2.Bluenumbers=
-4.0-4.8
in
with
-3.4
hit
-0.7
-4.5
he
me
tart
-3.3
struck
-3.7
a
one
hit
-3.5
pie
was
-4.3
struck
I
tart
-5.3
got
-0.9-3.8
-1.8
Backtracktoobtainthefullhypothesis
Beamsearchdecoding:stoppingcriterion
.Ingreedydecoding,usuallywedecodeuntilthemodelproducesan<END>token
.Forexample:<START>hehitmewithapie<END>
.Inbeamsearchdecoding,differenthypothesesmayproduce<END>tokenson
differenttimesteps
.Whenahypothesisproduces<END>,thathypothesisiscomplete.
.Placeitasideandcontinueexploringotherhypothesesviabeamsearch.
.Usuallywecontinuebeamsearchuntil:
.WereachtimestepT(whereTissomepre-definedcutoff),or
.Wehaveatleastncompletedhypotheses(wherenispre-definedcutoff)
45
Beamsearchdecoding:finishingup
.Wehaveourlistofcompletedhypotheses.
.Howtoselecttoponewithhighestscore?
.Eachhypothesisonourlisthasascore
.Problemwiththis:longerhypotheseshavelowerscores
.Fix:Normalizebylength.Usethistoselecttoponeinstead:
46
AdvantagesofNMT
ComparedtoSMT,NMThasmanyadvantages:
.Betterperformance
.Morefluent
.Betteruseofcontext
.Betteruseofphrasesimilarities
.Asingleneuralnetworktobeoptimizedend-to-end
.Nosubcomponentstobeindividuallyoptimized
.Requiresmuchlesshumanengineeringeffort
.Nofeatureengineering
.Samemethodforalllanguagepairs
47
DisadvantagesofNMT?
ComparedtoSMT:
.NMTislessinterpretable
.Hardtodebug
.NMTisdifficulttocontrol
.Forexample,can’teasilyspecifyrulesorguidelinesfortranslation
.Safetyconcerns!
48
HowdoweevaluateMachineTranslation?
BLEU(BilingualEvaluationUnderstudy)You’llseeBLEUindetail
inAssignment4!
.BLEUcomparesthemachine-writtentranslationtooneorseveralhuman-writtentranslation(s),andcomputesasimilarityscorebasedon:
.n-gramprecision(usuallyfor1,2,3and4-grams)
.Plusapenaltyfortoo-shortsystemtranslations
.BLEUisusefulbutimperfect
.Therearemanyvalidwaystotranslateasentence
.SoagoodtranslationcangetapoorBLEUscorebecauseithaslown-gramoverlapwiththehumantranslationL
49Source:”BLEU:aMethodforAutomaticEvaluationofMachineTranslation",Papinenietal,2002.
/anthology/P02-1040
MTprogressovertime
[EdinburghEn-DeWMTnewstest2013CasedBLEU;NMT2015fromU.Montréal;NMT2019FAIRonnewstest2019]
45
40
35
30
25
20
15
10
5
0
50
Phrase-basedSMT
Syntax-basedSMT
NeuralMT
2013201420152016201720182019
Sources:
http://www.meta-net.eu/events/meta-forum-2016/slides/09_sennrich.pdf&/
NMT:perhapsthebiggestsuccessstoryofNLPDeepLearning?
NeuralMachineTranslationwentfromafringeresearchattemptin2014totheleading
standardmethodin2016
.2014:Firstseq2seqpaperpublished
.2016:GoogleTranslateswitchesfromSMTtoNMT–andby2018everyonehas
.Thisisamazing!
.SMTsystems,builtbyhundredsofengineersovermanyyears,outperformedby
NMTsystemstrainedbyasmallgroupofengineersinafewmonths
51
So,isMachineTranslationsolved?
.Nope!
.Manydifficultiesremain:
.Out-of-vocabularywords
.Domainmismatchbetweentrainandtestdata
.Maintainingcontextoverlongertext
.Low-resourcelanguagepairs
.Failurestoaccuratelycapturesentencemeaning
.Pronoun(orzeropronoun)resolutionerrors
.Morphologicalagreementerrors
Furtherreading:“HasAIsurpassedhumansattranslation?Notevenclose!”
/editorials/state_of_nmt
52
SoisMachineTranslationsolved?
.Nope!
.Usingcommonsenseisstillhard
?
53
SoisMachineTranslationsolved?
.Nope!
.NMTpicksupbiasesintrainingdata
Didn’tspecifygender
Source:
/bias-sexist-or-this-is-the-way-it-should-be-ce1f7c8c683c
54
SoisMachineTranslationsolved?
.Nope!
.Uninterpretablesystemsdostrangethings
.(ButIthinkthisproblemhasbeenfixedinGoogleTranslateby2021?)
55
Picturesource:
/en_uk/article/j5npeg/why-is-google-translate-spitting-out-sinister-religious-prophecies
Explanation:
/briefs/google-nmt-prophecies
NMTresearchcontinues
NMTisaflagshiptaskforNLPDeepLearning
.NMTresearchhaspioneeredmanyoftherecentinnovationsofNLPDeepLearning
.In2021:NMTresearchcontinuestothrive
.Researchershavefoundmany,manyimprovementstothe“vanilla”seq2seqNMT
systemwe’vejustpresented
.Butwe’llpresentinaminuteoneimprovementsointegralthatitisthenewvanilla…
ATTENTION
56
Assignment4:Cherokee-Englishmachinetranslation!
.CherokeeisanendangeredNativeAmericanlanguage–about2000fluentspeakers.Extremelylowresource:About20kparallelsentencesavailable,mostfromthebible
.ᎪᎯᎩᏴᏥᎨᏒᎢᎦᎵᏉᎩᎢᏯᏂᎢᎠᏂᏧᏣ.ᏂᎪᎯᎸᎢᏗᎦᎳᏫᎢᏍᏗᎢᏩᏂᏯᎡᎢ
ᏓᎾᏁᎶᎲᏍᎬᎢᏅᏯᎪᏢᏔᏅᎢᎦᏆᏗᎠᏂᏐᏆᎴᎵᏙᎲᎢᎠᎴᎤᏓᏍᏈᏗᎦᎾᏍᏗᎠᏅᏗᏍᎨᎢ
ᎠᏅᏂᎲᎢ.
Longagoweresevenboyswhousedtospendalltheirtimedownbythetownhouse
playinggames,rollingastonewheelalongtheground,slidingandstrikingitwithastick
.WritingsystemisasyllabaryofsymbolsforeachCVunit(85letters)
.ManythankstoShiyueZhang,BenjaminFrey,andMohitBansal
fromUNCChapelHillfortheresourcesforthisassignment!
.CherokeeisnotavailableonGoogleTranslate!
57
Cherokee
.CherokeeoriginallylivedinwesternNorthCarolinaandeasternTennessee.MostspeakersnowinOklahoma,followingtheTrailofTears;someinNC
.WritingsystemInventedbySequoyaharound1820–someonewho
waspreviouslyilliterate
.Veryeffective:InthefollowingdecadesCherokeeliteracywashigher
thanforwhitepeopleinthesoutheasternUnitedStates
.
58
Section3:Attention
59
DecoderRNN
Sequence-to-sequence:thebottleneckproblem
EncoderRNN
60
Encodingofthe
sourcesentence.
Targetsentence(output)
hehitmewithapie<END>
ilam’entarté
<START>hehitmewithapie
Sourcesentence(input)
Problemswiththisarchitecture?
EncoderRNN
DecoderRNN
Sequence-to-sequence:thebottleneckproblem
Encodingofthe
sourcesentence.
Targetsentence(output)
Thisneedstocaptureall
informationaboutthe
hehitmewithapie<END>
sourcesentence.
Informationbottleneck!
ilam’entarté
Sourcesentence(input)
<START>hehitmewithapie
61
Attention
.Attentionprovidesasolutiontothebottleneckproblem.
.Coreidea:oneachstepofthedecoder,usedirectconnectiontotheencodertofocusonaparticularpartofthesourcesequence
.First,wewillshowviadiagram(noequations),thenwewillshowwithequations
62
EncoderAttention
RNNscores
{{
DecoderRNN
}
Sequence-to-sequencewithattention
63
dotproduct
ilam’entarté
Sourcesentence(input)
<START>
EncoderAttention
RNNscores
{{
DecoderRNN
}
Sequence-to-sequencewithattention
64
dotproduct
ilam’entarté
Sourcesentence(input)
<START>
EncoderAttention
RNNscores
{{
DecoderRNN
}
Sequence-to-sequencewithattention
65
dotproduct
ilam’entarté
Sourcesentence(input)
<START>
EncoderAttention
RNN
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 贵州省黔东南苗族侗族自治州2025届初三毕业班3月适应性线上测试(一)英语试题含答案
- 西北师范大学《沙盘模拟与管理建模》2023-2024学年第二学期期末试卷
- 彩色几何静物课件
- 小学生知识类话剧绘本
- 广东省安全员B证第四批(项目负责人)复审模拟题及答案
- 幼儿教育学绪论
- 电厂及变电站电气设备5电气主接线1
- 培训学校教师师德培训
- 小学语文教学技能培训
- CTR数据分享-突围jpg
- 心电监护仪的使用幻灯片
- 尿源性脓毒血症诊疗指南
- 《推销实务》考试试卷及答案
- 软件功能需求说明书
- 云计算导论(大数据技术、云计算技术相关专业)全套教学课件
- 桥隧工技师考试题库
- 数字普惠金融发展对企业绩效影响的实证研究
- 山东水利职业学院辅导员考试试题2024
- 语言景观研究的视角、理论与方法
- JJG 635-2011二氧化碳红外气体分析器
- GB/T 42442.2-2024智慧城市智慧停车第2部分:数据要求
评论
0/150
提交评论