版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1David A. Smith (JHU UMass Amherst)Jason Eisner (Johns Hopkins University)Dependency Parsingby Belief PropagationOutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the be
2、st parse: Belief propagationExperimentsConclusionsNew!OldOutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Ol
3、dMODWord Dependency ParsingHe reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentencePart-of-speech tagging He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .POS-tagged sentenceWord
4、 dependency parsingslide adapted from Yuji MatsumotoWord dependency parsed sentenceHe reckons the current account deficit will narrow to only billion in September .SUBJROOTS-COMPSUBJSPECMODMODCOMPCOMPWhat does parsing have to do with belief propagation?loopy belief propagationbeliefloopypropagationO
5、utlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Old7Great ideas in NLP: Log-linear models (Berger, della Pie
6、tra, della Pietra 1996; Darroch & Ratcliff 1972)In the beginning, we used generative models.p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * each choice depends on a limited part of the historybut which dependencies to allow?what if theyre all worthwhile?p(D | A,B,C)?p(D | A,B,C)? p(D | A,B) * p(C | A,
7、B,D)?8Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)In the beginning, we used generative models.Solution: Log-linear (max-entropy) modelingFeatures may interact in arbitrary waysIterative scaling keeps adjusting the feature weightsuntil the m
8、odel agrees with the training data.p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * which dependencies to allow? (given limited training data)(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * throw them all in!Log-linear models great for n-way classificationAlso good for predicting se
9、quencesAlso good for dependency parsing9How about structured outputs?but to allow fast dynamic programming, only use n-gram featuresbut to allow fast dynamic programming or MST parsing,only use single-edge featuresfind preferred linksfindpreferredtagsvan10How about structured outputs?but to allow fa
10、st dynamic programming or MST parsing,only use single-edge featuresfind preferred linksEdge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?yes, lots of green .Edge-Factored Parsers (McD
11、onald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainth
12、estrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCEdge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCA NE
13、dge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCA Npreceding conjunctionA NEdge-Factored Parsers (McDonald et al. 2005)Byljasnstuden
14、dubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCnot as good, lots of red .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestriki
15、ngHow about this competing edge?VAAANJNVCjasn hodiny(“bright clocks). undertrained .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCbyljasnstuddubndenahodiodbtinj
16、asn hodiny(“bright clocks”). undertrained .jasn hodi(“bright clock,stems only)Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCjasn hodi(“bright clock,stems only)b
17、yljasnstuddubndenahodiodbtinAplural Nsingular jasn hodiny(“bright clocks”). undertrained .jasn hodiny(“bright clocks”). undertrained .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this comp
18、eting edge?VAAANJNVCjasn hodi(“bright clock,stems only)byljasnstuddubndenahodiodbtinAplural Nsingular A N where N followsa conjunctionjasnEdge-Factored Parsers (McDonald et al. 2005)Bylstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingVAAANJNVCbyljasnstu
19、ddubndenahodiodbtinWhich edge is better?“bright day or “bright clocks?jasnEdge-Factored Parsers (McDonald et al. 2005)Bylstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingVAAANJNVCbyljasnstuddubndenahodiodbtinWhich edge is better?Score of an edge e = fea
20、tures(e)Standard algos valid parse with max total scoreour current weight vectorEdge-Factored Parsers (McDonald et al. 2005)Which edge is better?Score of an edge e = features(e)Standard algos valid parse with max total scoreour current weight vectorcant have both(one parent per word)cant have both(n
21、o crossing links)Cant have all three(no cycles)Thus, an edge may lose (or win) because of a consensus of other edges. OutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding t
22、he best parse: Belief propagationExperimentsConclusionsNew!OldFinding Highest-Scoring ParseThe cat in the hat wore a stovepipe. ROOTConvert to context-free grammar (CFG)Then use dynamic programmingeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTlets vertic
23、ally stretch this graph drawingFinding Highest-Scoring Parseeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTso CKYs “grammar constant is no longer constant Convert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(
24、n3)Unfortunately, O(n5) in this caseto score “cat wore link, not enough to know this is NPmust know its rooted at “catso expand nonterminal set by O(n): NPthe, NPcat, NPhat, .Finding Highest-Scoring Parseeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTConv
25、ert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(n3)Unfortunately, O(n5) in this caseSolution: Use a different decomposition (Eisner 1996)Back to O(n3)Spans vs. constituentsTwo kinds of substring.Constituent of the tree: links to the rest only through i
26、ts headword (root).Span of the tree: links to the rest only through its endwords.The cat in the hat wore a stovepipe. ROOTThe cat in the hat wore a stovepipe. ROOTDecomposing a tree into spansThe cat in the hat wore a stovepipe. ROOTThe catwore a stovepipe. ROOTcat in the hat wore+in the hat worecat
27、 in+hat worein the hat+cat in the hat wore a stovepipe. ROOTFinding Highest-Scoring ParseConvert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(n3)Unfortunately, O(n5) in this caseSolution: Use a different decomposition (Eisner 1996)Back to O(n3)Can play
28、usual tricks for dynamic programming parsingFurther refining the constituents or spans Allow prob. model to keep track of even more internal information A*, best-first, coarse-to-fineTraining by EM etc.require “outside” probabilitiesof constituents, spans, or links Hard Constraints on Valid TreesSco
29、re of an edge e = features(e)Standard algos valid parse with max total scoreour current weight vectorcant have both(one parent per word)cant have both(no crossing links)Cant have all three(no cycles)Thus, an edge may lose (or win) because of a consensus of other edges. talkNon-Projective Parsescant
30、have both(no crossing links)The “projectivity restriction.Do we really want it?IgiveaonbootstrappingtomorrowROOTllsubtree rooted at “talkis a discontiguous noun phraseNon-Projective ParsesistameamnoritgloriacanitiemROOTIgiveaonbootstrappingtalktomorrowROOTllthatNOMmyACCmay-knowgloryNOMgoing-grayACC
31、That glory may-know my going-gray (i.e., it shall last till I go gray)occasional non-projectivity in Englishfrequent non-projectivity in Latin, etc.Finding highest-scoring non-projective treeConsider the sentence “John saw Mary (left).The Chu-Liu-Edmonds algorithm finds the maximum-weight spanning t
32、ree (right) may be non-projective.Can be found in time O(n2).91030203001139rootJohnsawMary103030rootJohnsawMaryslide thanks to Dragomir RadevEvery node selects best parentIf cycles, contract them and repeatSumming over all non-projective treesFinding highest-scoring non-projective treeConsider the s
33、entence “John saw Mary (left).The Chu-Liu-Edmonds algorithm finds the maximum-weight spanning tree (right) may be non-projective.Can be found in time O(n2).How about total weight Z of all trees?How about outside probabilities or gradients?Can be found in time O(n3) by matrix determinants and inverse
34、s (Smith & Smith, 2007).slide thanks to Dragomir RadevGraph Theory to the Rescue!Tuttes Matrix-Tree Theorem (1948)The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at
35、node r.Exactly the Z we need!O(n3) time!36Building the Kirchoff (Laplacian) MatrixNegate edge scoresSum columns (children)Strike root row/col.Take determinantN.B.: This allows multiple children of root, but see Koo et al. 2007.37Why Should This Work?Chu-Liu-Edmonds analogy:Every node selects best pa
36、rentIf cycles, contract and recurClear for 1x1 matrix; use inductionUndirected case; special root cases for directed38OutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding t
37、he best parse: Belief propagationExperimentsConclusionsNew!Old3940Exactly Finding the Best ParseWith arbitrary features, runtime blows upProjective parsing: O(n3) by dynamic programmingNon-projective: O(n2) by minimum spanning treebut to allow fast dynamic programming or MST parsing,only use single-
38、edge featuresfind preferred linksO(n4)grandparentsO(n5)grandp.+ siblingbigrams O(n3g6)POS trigrams O(2n)sibling pairs (non-adjacent)NP-hardany of the above featuressoft penalties for crossing linkspretty much anything else!41Lets reclaim our freedom (again!)Output probability is a product of local f
39、actorsThrow in any factors we want! (log-linear model)How could we find best parse?Integer linear programming (Riedel et al., 2006)doesnt give us probabilities when training or parsingMCMCSlow to mix? High rejection rate because of hard TREE constraint?Greedy hill-climbing (McDonald & Pereira 2006)T
40、his paper in a nutshell(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * none of these exploittree structure of parsesas the first-order methods do42Lets reclaim our freedom (again!)Output probability is a product of local factorsThrow in any factors we want! (log-linear model)Let
41、local factors negotiate via “belief propagationLinks (and tags) reinforce or suppress one another Each iteration takes total time O(n2) or O(n3)Converges to a pretty good (but approx.) global parsecertain global factors ok tooeach global factor can be handled fast via some traditional parsing algori
42、thm (e.g., inside-outside)This paper in a nutshell(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * Lets reclaim our freedom (again!)Training with many featuresDecoding with many featuresIterative scalingBelief propagationEach weight in turn is influenced by othersEach variable in
43、turn is influenced by othersIterate to achieve globally optimal weightsIterate to achievelocally consistent beliefsTo train distrib. over trees, use dynamic programming to compute normalizer ZTo decode distrib. over trees, use dynamic programming to compute messagesThis paper in a nutshellNew!Outlin
44、eEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Old44First, a familiar exampleConditional Random Field (CRF) for
45、POS tagging45Local factors in a graphical modelfindpreferredtagsvvvPossible tagging (i.e., assignment to remaining variables)Observed input sentence (shaded)46Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvanPossible tagging
46、 (i.e., assignment to remaining variables)Another possible taggingObserved input sentence (shaded)47Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav 021n210a031vnav 021n210a031Binary factor that measures compatibility of 2
47、 adjacent tagsModel reusessame parametersat this position48Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagIts values depend on corresponding wordcant be adjv 0.2n0.2a049Local factors
48、 in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagIts values depend on corresponding word(could be made to depend onentire observed sentence)50Local factors in a graphical modelFirst, a familiar exa
49、mpleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagDifferent unary factor at each positionv 0.3n0.02a0v 0.3n0a0.151Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav
50、 021n210a031v 0.3n0.02a0vnav 021n210a031v 0.3n0a0.1v 0.2n0.2a0vanp(v a n) is proportionalto the product of allfactors values on v a n52Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav 021n210a031v 0.3n0.02a0vnav 021n210a03
51、1v 0.3n0a0.1v 0.2n0.2a0van= 1*3*0.3*0.1*0.2 p(v a n) is proportionalto the product of allfactors values on v a n53First, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible linksvanLocal factors in a graphical modelfindpreferredlinksFirst, a f
52、amiliar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible links54Local factors in a graphical modelfindpreferredlinkstfftffPossible parse encoded as an assignment to these varsvanFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsin
53、g!O(n2) boolean variables for the possible links55Local factors in a graphical modelfindpreferredlinksfftftfPossible parse encoded as an assignment to these varsAnother possible parsevanFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possibl
54、e links(cycle)56Local factors in a graphical modelfindpreferredlinksftttfPossible parse encoded as an assignment to these varsAnother possible parseAn illegal parsevanfFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible links(cycle)57Lo
55、cal factors in a graphical modelfindpreferredlinkstttPossible parse encoded as an assignment to these varsAnother possible parseAn illegal parseAnother illegal parsevant(multiple parents)ffSo what factors shall we multiply to define parse probability?Unary factors to evaluate each link in isolation5
56、8Local factors for parsingfindpreferredlinkst 2f1t 1f2t 1f2t 1f6t 1f3as before, goodness of this link can depend on entireobserved input contextt 1f8some other linksarent as goodgiven this inputsentenceBut what if the best assignment isnt a tree?59Global factors for parsingSo what factors shall we m
57、ultiply to define parse probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 1findpreferredlinksffffff0ffffft0fffftf0fftfft1tttttt0So what factors shall we multiply to define parse
58、probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 160Global factors for parsingfindpreferredlinksffffff0ffffft0fffftf0fftfft1tttttt0tfftff64 entries (0/1)So far, this is equivale
59、nt toedge-factored parsing(McDonald et al. 2005).Note: McDonald et al. (2005) dont loop through this table to consider exponentially many trees one at a time.They use combinatorial algorithms; so should we!optionally require the tree to be projective (no crossing links)werelegal!So what factors shal
60、l we multiply to define parse probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 1Second-order effects: factors on 2 variablesgrandparent61Local factors for parsingfindpreferredli
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024版工程质量检测与评估合同
- 2024年度北京存量房交易过户合同
- 2024年度卫星导航技术研发与应用合作合同
- 2024年度版权登记合同服务内容
- 个人对个人租车合同
- 面包车租赁合同
- 公司员工安全协议书
- 2024年度贸易居间履约保函合同
- 二零二四年度短视频内容创作与承包合同
- 2024年度汽车租赁与销售合同(含车辆供应、售后服务、保险理赔)
- 2024届河北省普通高中学业水平选择性考试英语试题
- 2069-3-3101-002WKB产品判定准则-外发
- MOOC 市场调查与研究-南京邮电大学 中国大学慕课答案
- 2024年中考语文【热点重点难点】专练(上海专用)重点02议论文阅读常见题型((原卷版+解析))
- 元宝山产业园(经济转型开发区) 建设项目地质灾害危险性评估报告评审意见
- 网络销售药品质量安全管理制度
- 银行开门红营销思路
- 学生网络安全意识调研报告
- (高清版)TDT 1053-2017 农用地质量分等数据库标准
- 2024年广州市高三一模高考英语试卷试题答案详解(含作文范文)
- 硬笔书法基础与实训 (中职书法)全套教学课件
评论
0/150
提交评论