版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1David A. Smith (JHU UMass Amherst)Jason Eisner (Johns Hopkins University)Dependency Parsingby Belief PropagationOutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the be
2、st parse: Belief propagationExperimentsConclusionsNew!OldOutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Ol
3、dMODWord Dependency ParsingHe reckons the current account deficit will narrow to only 1.8 billion in September.Raw sentencePart-of-speech tagging He reckons the current account deficit will narrow to only 1.8 billion in September. PRP VBZ DT JJ NN NN MD VB TO RB CD CD IN NNP .POS-tagged sentenceWord
4、 dependency parsingslide adapted from Yuji MatsumotoWord dependency parsed sentenceHe reckons the current account deficit will narrow to only billion in September .SUBJROOTS-COMPSUBJSPECMODMODCOMPCOMPWhat does parsing have to do with belief propagation?loopy belief propagationbeliefloopypropagationO
5、utlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Old7Great ideas in NLP: Log-linear models (Berger, della Pie
6、tra, della Pietra 1996; Darroch & Ratcliff 1972)In the beginning, we used generative models.p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * each choice depends on a limited part of the historybut which dependencies to allow?what if theyre all worthwhile?p(D | A,B,C)?p(D | A,B,C)? p(D | A,B) * p(C | A,
7、B,D)?8Great ideas in NLP: Log-linear models (Berger, della Pietra, della Pietra 1996; Darroch & Ratcliff 1972)In the beginning, we used generative models.Solution: Log-linear (max-entropy) modelingFeatures may interact in arbitrary waysIterative scaling keeps adjusting the feature weightsuntil the m
8、odel agrees with the training data.p(A) * p(B | A) * p(C | A,B) * p(D | A,B,C) * which dependencies to allow? (given limited training data)(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * throw them all in!Log-linear models great for n-way classificationAlso good for predicting se
9、quencesAlso good for dependency parsing9How about structured outputs?but to allow fast dynamic programming, only use n-gram featuresbut to allow fast dynamic programming or MST parsing,only use single-edge featuresfind preferred linksfindpreferredtagsvan10How about structured outputs?but to allow fa
10、st dynamic programming or MST parsing,only use single-edge featuresfind preferred linksEdge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?yes, lots of green .Edge-Factored Parsers (McD
11、onald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainth
12、estrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCEdge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCA NE
13、dge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingIs this a good edge?jasn den(“bright day)jasn N(“bright NOUN)VAAANJNVCA Npreceding conjunctionA NEdge-Factored Parsers (McDonald et al. 2005)Byljasnstuden
14、dubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCnot as good, lots of red .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestriki
15、ngHow about this competing edge?VAAANJNVCjasn hodiny(“bright clocks). undertrained .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCbyljasnstuddubndenahodiodbtinj
16、asn hodiny(“bright clocks”). undertrained .jasn hodi(“bright clock,stems only)Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this competing edge?VAAANJNVCjasn hodi(“bright clock,stems only)b
17、yljasnstuddubndenahodiodbtinAplural Nsingular jasn hodiny(“bright clocks”). undertrained .jasn hodiny(“bright clocks”). undertrained .Edge-Factored Parsers (McDonald et al. 2005)Byljasnstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingHow about this comp
18、eting edge?VAAANJNVCjasn hodi(“bright clock,stems only)byljasnstuddubndenahodiodbtinAplural Nsingular A N where N followsa conjunctionjasnEdge-Factored Parsers (McDonald et al. 2005)Bylstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingVAAANJNVCbyljasnstu
19、ddubndenahodiodbtinWhich edge is better?“bright day or “bright clocks?jasnEdge-Factored Parsers (McDonald et al. 2005)Bylstudendubnovdenahodinyodbjelytinctou“ItbrightcolddayAprilandclockswerethirteen”wasainthestrikingVAAANJNVCbyljasnstuddubndenahodiodbtinWhich edge is better?Score of an edge e = fea
20、tures(e)Standard algos valid parse with max total scoreour current weight vectorEdge-Factored Parsers (McDonald et al. 2005)Which edge is better?Score of an edge e = features(e)Standard algos valid parse with max total scoreour current weight vectorcant have both(one parent per word)cant have both(n
21、o crossing links)Cant have all three(no cycles)Thus, an edge may lose (or win) because of a consensus of other edges. OutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding t
22、he best parse: Belief propagationExperimentsConclusionsNew!OldFinding Highest-Scoring ParseThe cat in the hat wore a stovepipe. ROOTConvert to context-free grammar (CFG)Then use dynamic programmingeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTlets vertic
23、ally stretch this graph drawingFinding Highest-Scoring Parseeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTso CKYs “grammar constant is no longer constant Convert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(
24、n3)Unfortunately, O(n5) in this caseto score “cat wore link, not enough to know this is NPmust know its rooted at “catso expand nonterminal set by O(n): NPthe, NPcat, NPhat, .Finding Highest-Scoring Parseeach subtree is a linguistic constituent(here a noun phrase)ThecatinthehatworeastovepipeROOTConv
25、ert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(n3)Unfortunately, O(n5) in this caseSolution: Use a different decomposition (Eisner 1996)Back to O(n3)Spans vs. constituentsTwo kinds of substring.Constituent of the tree: links to the rest only through i
26、ts headword (root).Span of the tree: links to the rest only through its endwords.The cat in the hat wore a stovepipe. ROOTThe cat in the hat wore a stovepipe. ROOTDecomposing a tree into spansThe cat in the hat wore a stovepipe. ROOTThe catwore a stovepipe. ROOTcat in the hat wore+in the hat worecat
27、 in+hat worein the hat+cat in the hat wore a stovepipe. ROOTFinding Highest-Scoring ParseConvert to context-free grammar (CFG)Then use dynamic programmingCKY algorithm for CFG parsing is O(n3)Unfortunately, O(n5) in this caseSolution: Use a different decomposition (Eisner 1996)Back to O(n3)Can play
28、usual tricks for dynamic programming parsingFurther refining the constituents or spans Allow prob. model to keep track of even more internal information A*, best-first, coarse-to-fineTraining by EM etc.require “outside” probabilitiesof constituents, spans, or links Hard Constraints on Valid TreesSco
29、re of an edge e = features(e)Standard algos valid parse with max total scoreour current weight vectorcant have both(one parent per word)cant have both(no crossing links)Cant have all three(no cycles)Thus, an edge may lose (or win) because of a consensus of other edges. talkNon-Projective Parsescant
30、have both(no crossing links)The “projectivity restriction.Do we really want it?IgiveaonbootstrappingtomorrowROOTllsubtree rooted at “talkis a discontiguous noun phraseNon-Projective ParsesistameamnoritgloriacanitiemROOTIgiveaonbootstrappingtalktomorrowROOTllthatNOMmyACCmay-knowgloryNOMgoing-grayACC
31、That glory may-know my going-gray (i.e., it shall last till I go gray)occasional non-projectivity in Englishfrequent non-projectivity in Latin, etc.Finding highest-scoring non-projective treeConsider the sentence “John saw Mary (left).The Chu-Liu-Edmonds algorithm finds the maximum-weight spanning t
32、ree (right) may be non-projective.Can be found in time O(n2).91030203001139rootJohnsawMary103030rootJohnsawMaryslide thanks to Dragomir RadevEvery node selects best parentIf cycles, contract them and repeatSumming over all non-projective treesFinding highest-scoring non-projective treeConsider the s
33、entence “John saw Mary (left).The Chu-Liu-Edmonds algorithm finds the maximum-weight spanning tree (right) may be non-projective.Can be found in time O(n2).How about total weight Z of all trees?How about outside probabilities or gradients?Can be found in time O(n3) by matrix determinants and inverse
34、s (Smith & Smith, 2007).slide thanks to Dragomir RadevGraph Theory to the Rescue!Tuttes Matrix-Tree Theorem (1948)The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at
35、node r.Exactly the Z we need!O(n3) time!36Building the Kirchoff (Laplacian) MatrixNegate edge scoresSum columns (children)Strike root row/col.Take determinantN.B.: This allows multiple children of root, but see Koo et al. 2007.37Why Should This Work?Chu-Liu-Edmonds analogy:Every node selects best pa
36、rentIf cycles, contract and recurClear for 1x1 matrix; use inductionUndirected case; special root cases for directed38OutlineEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding t
37、he best parse: Belief propagationExperimentsConclusionsNew!Old3940Exactly Finding the Best ParseWith arbitrary features, runtime blows upProjective parsing: O(n3) by dynamic programmingNon-projective: O(n2) by minimum spanning treebut to allow fast dynamic programming or MST parsing,only use single-
38、edge featuresfind preferred linksO(n4)grandparentsO(n5)grandp.+ siblingbigrams O(n3g6)POS trigrams O(2n)sibling pairs (non-adjacent)NP-hardany of the above featuressoft penalties for crossing linkspretty much anything else!41Lets reclaim our freedom (again!)Output probability is a product of local f
39、actorsThrow in any factors we want! (log-linear model)How could we find best parse?Integer linear programming (Riedel et al., 2006)doesnt give us probabilities when training or parsingMCMCSlow to mix? High rejection rate because of hard TREE constraint?Greedy hill-climbing (McDonald & Pereira 2006)T
40、his paper in a nutshell(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * none of these exploittree structure of parsesas the first-order methods do42Lets reclaim our freedom (again!)Output probability is a product of local factorsThrow in any factors we want! (log-linear model)Let
41、local factors negotiate via “belief propagationLinks (and tags) reinforce or suppress one another Each iteration takes total time O(n2) or O(n3)Converges to a pretty good (but approx.) global parsecertain global factors ok tooeach global factor can be handled fast via some traditional parsing algori
42、thm (e.g., inside-outside)This paper in a nutshell(1/Z) * (A) * (B,A) * (C,A) * (C,B) * (D,A,B) * (D,B,C) * (D,A,C) * Lets reclaim our freedom (again!)Training with many featuresDecoding with many featuresIterative scalingBelief propagationEach weight in turn is influenced by othersEach variable in
43、turn is influenced by othersIterate to achieve globally optimal weightsIterate to achievelocally consistent beliefsTo train distrib. over trees, use dynamic programming to compute normalizer ZTo decode distrib. over trees, use dynamic programming to compute messagesThis paper in a nutshellNew!Outlin
44、eEdge-factored parsingDependency parsesScoring the competing parses: Edge featuresFinding the best parseHigher-order parsingThrowing in more features: Graphical modelsFinding the best parse: Belief propagationExperimentsConclusionsNew!Old44First, a familiar exampleConditional Random Field (CRF) for
45、POS tagging45Local factors in a graphical modelfindpreferredtagsvvvPossible tagging (i.e., assignment to remaining variables)Observed input sentence (shaded)46Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvanPossible tagging
46、 (i.e., assignment to remaining variables)Another possible taggingObserved input sentence (shaded)47Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav 021n210a031vnav 021n210a031Binary factor that measures compatibility of 2
47、 adjacent tagsModel reusessame parametersat this position48Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagIts values depend on corresponding wordcant be adjv 0.2n0.2a049Local factors
48、 in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagIts values depend on corresponding word(could be made to depend onentire observed sentence)50Local factors in a graphical modelFirst, a familiar exa
49、mpleConditional Random Field (CRF) for POS taggingfindpreferredtagsv 0.2n0.2a0“Unary factor evaluates this tagDifferent unary factor at each positionv 0.3n0.02a0v 0.3n0a0.151Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav
50、 021n210a031v 0.3n0.02a0vnav 021n210a031v 0.3n0a0.1v 0.2n0.2a0vanp(v a n) is proportionalto the product of allfactors values on v a n52Local factors in a graphical modelFirst, a familiar exampleConditional Random Field (CRF) for POS taggingfindpreferredtagsvnav 021n210a031v 0.3n0.02a0vnav 021n210a03
51、1v 0.3n0a0.1v 0.2n0.2a0van= 1*3*0.3*0.1*0.2 p(v a n) is proportionalto the product of allfactors values on v a n53First, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible linksvanLocal factors in a graphical modelfindpreferredlinksFirst, a f
52、amiliar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible links54Local factors in a graphical modelfindpreferredlinkstfftffPossible parse encoded as an assignment to these varsvanFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsin
53、g!O(n2) boolean variables for the possible links55Local factors in a graphical modelfindpreferredlinksfftftfPossible parse encoded as an assignment to these varsAnother possible parsevanFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possibl
54、e links(cycle)56Local factors in a graphical modelfindpreferredlinksftttfPossible parse encoded as an assignment to these varsAnother possible parseAn illegal parsevanfFirst, a familiar exampleCRF for POS taggingNow lets do dependency parsing!O(n2) boolean variables for the possible links(cycle)57Lo
55、cal factors in a graphical modelfindpreferredlinkstttPossible parse encoded as an assignment to these varsAnother possible parseAn illegal parseAnother illegal parsevant(multiple parents)ffSo what factors shall we multiply to define parse probability?Unary factors to evaluate each link in isolation5
56、8Local factors for parsingfindpreferredlinkst 2f1t 1f2t 1f2t 1f6t 1f3as before, goodness of this link can depend on entireobserved input contextt 1f8some other linksarent as goodgiven this inputsentenceBut what if the best assignment isnt a tree?59Global factors for parsingSo what factors shall we m
57、ultiply to define parse probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 1findpreferredlinksffffff0ffffft0fffftf0fftfft1tttttt0So what factors shall we multiply to define parse
58、probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 160Global factors for parsingfindpreferredlinksffffff0ffffft0fffftf0fftfft1tttttt0tfftff64 entries (0/1)So far, this is equivale
59、nt toedge-factored parsing(McDonald et al. 2005).Note: McDonald et al. (2005) dont loop through this table to consider exponentially many trees one at a time.They use combinatorial algorithms; so should we!optionally require the tree to be projective (no crossing links)werelegal!So what factors shal
60、l we multiply to define parse probability?Unary factors to evaluate each link in isolationGlobal TREE factor to require that the links form a legal treethis is a “hard constraint: factor is either 0 or 1Second-order effects: factors on 2 variablesgrandparent61Local factors for parsingfindpreferredli
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 专题一冠词-定冠词练习题(含答案)
- 确认收到融资协议草案回复函(3篇)范文
- 预防性侵害风险共建阳光校园小学主题班会课件
- 安全第一记心中健康快乐伴成长小学主题班会课件
- 湖南省衡阳市蒸湘区2025届数学四年级第二学期期末质量检测模拟试题含答案解析
- 湖南省衡阳市渣江镇2025年三年级数学第一学期阶段联考模拟试题含答案解析
- 房地产购房攻略从选房到签约全流程方案
- 企业新品推广会邀请函3篇
- 弘扬感恩精神与培养感恩之心小学主题班会课件
- 跨部门协作标准化流程操作手册指导书
- 西北农林科技大学2026年强基计划面试+体育测试模拟试题及答案解析
- 2026年湖南公开遴选公务员考试(公务员综合知识)经典试题及答案
- 2026年湖北英语(专升本)真题及答案
- DB44-T 2848-2026 装配式污水处理设施设计建设标准
- 安庆市2025安徽安庆市市直事业单位公开招聘81人笔试历年参考题库典型考点附带答案详解
- GB/T 47427-2026合成纤维预取向丝(POY)动态热应力试验方法
- 2026年广东省汕头市龙湖区中考一模考试地理试题(含答案)
- 设计单位财务制度
- GA/T 2198-2024法庭科学可疑样品中毒品和易制毒化学品定性定量检验方法通用规则
- 郑州市金水区2025-2026学年第二学期三年级语文期末考试卷(部编版含答案)
- 2026年食品安全规章制度目录清单
评论
0/150
提交评论