Interdisciplinary.ppt_第1页
Interdisciplinary.ppt_第2页
Interdisciplinary.ppt_第3页
Interdisciplinary.ppt_第4页
Interdisciplinary.ppt_第5页
已阅读5页,还剩203页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Advances in Word Sense Disambiguation,Tutorial at ACL 2005 June 25, 2005 Ted Pedersen University of Minnesota, Duluth /tpederse Rada Mihalcea University of North Texas /rada,Goal of the Tutorial,Introduce the problem of word sense disambiguation (WSD), focusing

2、 on the range of formulations and approaches currently practiced. Accessible to anyone with an interest in NLP. Persuade you to work on word sense disambiguation Its an interesting problem Lots of good work already done, still more to do There is infrastructure to help you get started Persuade you t

3、o use word sense disambiguation in your text applications.,Outline of Tutorial,Introduction (Ted) Methodolodgy (Rada) Knowledge Intensive Methods (Rada) Supervised Approaches (Ted) Minimally Supervised Approaches (Rada) / BREAK Unsupervised Learning (Ted) How to Get Started (Rada) Conclusion (Ted),P

4、art 1:Introduction,Outline,Definitions Ambiguity for Humans and Computers Very Brief Historical Overview Theoretical Connections Practical Applications,Definitions,Word sense disambiguation is the problem of selecting a sense for a word from a set of predefined possibilities. Sense Inventory usually

5、 comes from a dictionary or thesaurus. Knowledge intensive methods, supervised learning, and (sometimes) bootstrapping approaches Word sense discrimination is the problem of dividing the usages of a word into different meanings, without regard to any particular existing sense inventory. Unsupervised

6、 techniques,Outline,Definitions Ambiguity for Humans and Computers Very Brief Historical Overview Theoretical Connections Practical Applications,Computers versus Humans,Polysemy most words have many possible meanings. A computer program has no basis for knowing which one is appropriate, even if it i

7、s obvious to a human Ambiguity is rarely a problem for humans in their day to day communication, except in extreme cases,Ambiguity for Humans - Newspaper Headlines!,DRUNK GETS NINE YEARS IN VIOLIN CASE FARMER BILL DIES IN HOUSE PROSTITUTES APPEAL TO POPE STOLEN PAINTING FOUND BY TREE RED TAPE HOLDS

8、UP NEW BRIDGE DEER KILL 300,000 RESIDENTS CAN DROP OFF TREES INCLUDE CHILDREN WHEN BAKING COOKIES MINERS REFUSE TO WORK AFTER DEATH,Ambiguity for a Computer,The fisherman jumped off the bank and into the water. The bank down the street was robbed! Back in the day, we had an entire bank of computers

9、devoted to this problem. The bank in that road is entirely too steep and is really dangerous. The plane took a bank to the left, and then headed off towards the mountains.,Outline,Definitions Ambiguity for Humans and Computers Very Brief Historical Overview Theoretical Connections Practical Applicat

10、ions,Early Days of WSD,Noted as problem for Machine Translation (Weaver, 1949) A word can often only be translated if you know the specific sense intended (A bill in English could be a pico or a cuenta in Spanish) Bar-Hillel (1960) posed the following: Little John was looking for his toy box. Finall

11、y, he found it. The box was in the pen. John was very happy. Is “pen” a writing instrument or an enclosure where children play? declared it unsolvable, left the field of MT!,Since then,1970s - 1980s Rule based systems Rely on hand crafted knowledge sources 1990s Corpus based approaches Dependence on

12、 sense tagged text (Ide and Veronis, 1998) overview history from early days to 1998. 2000s Hybrid Systems Minimizing or eliminating use of sense tagged text Taking advantage of the Web,Outline,Definitions Ambiguity for Humans and Computers Very Brief Historical Overview Interdisciplinary Connections

13、 Practical Applications,Interdisciplinary Connections,Cognitive Science he put his coat over the back of the chair and sat down chair = the position of professor; he was awarded an endowed chair in economics With respect to the translation in a second language chair = chaise chair = directeur With r

14、espect to the context where it occurs (discrimination) “Sit on a chair” “Take a seat on this chair” “The chair of the Math Department” “The chair of the meeting”,Approaches to Word Sense Disambiguation,Knowledge-Based Disambiguation use of external lexical resources such as dictionaries and thesauri

15、 discourse properties Supervised Disambiguation based on a labeled training set the learning system has: a training set of feature-encoded inputs AND their appropriate sense label (category) Unsupervised Disambiguation based on unlabeled corpora The learning system has: a training set of feature-enc

16、oded inputs BUT NOT their appropriate sense label (category),All Words Word Sense Disambiguation,Attempt to disambiguate all open-class words in a text “He put his suit over the back of the chair” Knowledge-based approaches Use information from dictionaries Definitions / Examples for each meaning Fi

17、nd similarity between definitions and current context Position in a semantic network Find that “table” is closer to “chair/furniture” than to “chair/person” Use discourse properties A word exhibits the same sense in a discourse / in a collocation,All Words Word Sense Disambiguation,Minimally supervi

18、sed approaches Learn to disambiguate words using small annotated corpora E.g. SemCor corpus where all open class words are disambiguated 200,000 running words Most frequent sense,Targeted Word Sense Disambiguation,Disambiguate one target word “Take a seat on this chair” “The chair of the Math Depart

19、ment” WSD is viewed as a typical classification problem use machine learning techniques to train a system Training: Corpus of occurrences of the target word, each occurrence annotated with appropriate sense Build feature vectors: a vector of relevant linguistic features that represents the context (

20、ex: a window of words around the target word) Disambiguation: Disambiguate the target word in new unseen text,Targeted Word Sense Disambiguation,Take a window of n word around the target word Encode information about the words around the target word typical features include: words, root forms, POS t

21、ags, frequency, An electric guitar and bass player stand off to one side, not really part of the scene, just as a sort of nod to gringo expectations perhaps. Surrounding context (local features) (guitar, NN1), (and, CJC), (player, NN1), (stand, VVB) Frequent co-occurring words (topical features) fis

22、hing, big, sound, player, fly, rod, pound, double, runs, playing, guitar, band 0,0,0,1,0,0,0,0,0,0,1,0 Other features: followed by player, contains show in the sentence, yes, no, ,Unsupervised Disambiguation,Disambiguate word senses: without supporting tools such as dictionaries and thesauri without

23、 a labeled training text Without such resources, word senses are not labeled We cannot say “chair/furniture” or “chair/person” We can: Cluster/group the contexts of an ambiguous word into a number of groups Discriminate between these groups without actually labeling them,Unsupervised Disambiguation,

24、Hypothesis: same senses of words will have similar neighboring words Disambiguation algorithm Identify context vectors corresponding to all occurrences of a particular word Partition them into regions of high density Assign a sense to each such region “Sit on a chair” “Take a seat on this chair” “Th

25、e chair of the Math Department” “The chair of the meeting”,Evaluating Word Sense Disambiguation,Metrics: Precision = percentage of words that are tagged correctly, out of the words addressed by the system Recall = percentage of words that are tagged correctly, out of all words in the test set Exampl

26、e Test set of 100 words Precision = 50 / 75 = 0.66 System attempts 75 wordsRecall = 50 / 100 = 0.50 Words correctly disambiguated 50 Special tags are possible: Unknown Proper noun Multiple senses Compare to a gold standard SEMCOR corpus, SENSEVAL corpus, ,Evaluating Word Sense Disambiguation,Difficu

27、lty in evaluation: Nature of the senses to distinguish has a huge impact on results Coarse versus fine-grained sense distinction chair = a seat for one person, with a support for the back; he put his coat over the back of the chair and sat down“ chair = the position of professor; he was awarded an e

28、ndowed chair in economics“ bank = a financial institution that accepts deposits and channels the money into lending activities; he cashed a check at the bank; that bank holds the mortgage on my home bank = a building in which commercial banking is transacted; the bank is on the corner of Nassau and

29、Witherspoon“ Sense maps Cluster similar senses Allow for both fine-grained and coarse-grained evaluation,Bounds on Performance,Upper and Lower Bounds on Performance: Measure of how well an algorithm performs relative to the difficulty of the task. Upper Bound: Human performance Around 97%-99% with f

30、ew and clearly distinct senses Inter-judge agreement: With words with clear otherwise, create a new chain.,Semantic Similarity of a Global Context,A very long train traveling along the rails with a constant velocity v in a certain direction ,train,#1: public transport,#2: order set of things,#3: pie

31、ce of cloth,travel,#1 change location,#2: undergo transportation,rail,#1: a barrier,# 2: a bar of steel for trains,#3: a small bird,Lexical Chains for WSD,Identify lexical chains in a text Usually target one part of speech at a time Identify the meaning of words based on their membership to a lexica

32、l chain Evaluation: (Galley and McKeown 2003) lexical chains on 74 SemCor texts give 62.09% (Mihalcea and Moldovan 2000) on five SemCor texts give 90% with 60% recall lexical chains “anchored” on monosemous words (Okumura and Honda 1994) lexical chains on five Japanese texts give 63.4%,Outline,Task

33、definition Machine Readable Dictionaries Algorithms based on Machine Readable Dictionaries Selectional Restrictions Measures of Semantic Similarity Heuristic-based Methods,Example: “plant/flora” is used more often than “plant/factory” - annotate any instance of PLANT as “plant/flora”,Most Frequent S

34、ense (1),Identify the most often used meaning and use this meaning by default Word meanings exhibit a Zipfian distribution E.g. distribution of word senses in SemCor,Most Frequent Sense (2),Method 1: Find the most frequent sense in an annotated corpus Method 2: Find the most frequent sense using a m

35、ethod based on distributional similarity (McCarthy et al. 2004) 1. Given a word w, find the top k distributionally similar words Nw = n1, n2, , nk, with associated similarity scores dss(w,n1), dss(w,n2), dss(w,nk) 2. For each sense wsi of w, identify the similarity with the words nj, using the sense

36、 of nj that maximizes this score 3. Rank senses wsi of w based on the total similarity score,Most Frequent Sense(3),Word senses pipe #1 = tobacco pipe pipe #2 = tube of metal or plastic Distributional similar words N = tube, cable, wire, tank, hole, cylinder, fitting, tap, For each word in N, find s

37、imilarity with pipe#i (using the sense that maximizes the similarity) pipe#1 tube (#3) = 0.3 pipe#2 tube (#1) = 0.6 Compute score for each sense pipe#i score (pipe#1) = 0.25 score (pipe#2) = 0.73 Note: results depend on the corpus used to find distributionally similar words = can find domain specifi

38、c predominant senses,E.g. The ambiguous word PLANT occurs 10 times in a discourse all instances of “plant” carry the same meaning,One Sense Per Discourse,A word tends to preserve its meaning across all its occurrences in a given discourse (Gale, Church, Yarowksy 1992) What does this mean? Evaluation

39、: 8 words with two-way ambiguity, e.g. plant, crane, etc. 98% of the two-word occurrences in the same discourse carry the same meaning The grain of salt: Performance depends on granularity (Krovetz 1998) experiments with words with more than two senses Performance of “one sense per discourse” measur

40、ed on SemCor is approx. 70%,The ambiguous word PLANT preserves its meaning in all its occurrences within the collocation “industrial plant”, regardless of the context where this collocation occurs,One Sense per Collocation,A word tends to preserver its meaning when used in the same collocation (Yaro

41、wsky 1993) Strong for adjacent collocations Weaker as the distance between words increases An example Evaluation: 97% precision on words with two-way ambiguity Finer granularity: (Martinez and Agirre 2000) tested the “one sense per collocation” hypothesis on text annotated with WordNet senses 70% pr

42、ecision on SemCor words,References,(Agirre and Rigau, 1995) Agirre, E. and Rigau, G. A proposal for word sense disambiguation using conceptual distance. RANLP 1995. (Agirre and Martinez 2001) Agirre, E. and Martinez, D. Learning class-to-class selectional preferences. CONLL 2001. (Banerjee and Peder

43、sen 2002) Banerjee, S. and Pedersen, T. An adapted Lesk algorithm for word sense disambiguation using WordNet. CICLING 2002. (Cowie, Guthrie and Guthrie 1992), Cowie, L. and Guthrie, J. A. and Guthrie, L.: Lexical disambiguation using simulated annealing. COLING 2002. (Gale, Church and Yarowsky 1992

44、) Gale, W., Church, K., and Yarowsky, D. One sense per discourse. DARPA workshop 1992. (Halliday and Hasan 1976) Halliday, M. and Hasan, R., (1976). Cohesion in English. Longman. (Galley and McKeown 2003) Galley, M. and McKeown, K. (2003) Improving word sense disambiguation in lexical chaining. IJCA

45、I 2003 (Hirst and St-Onge 1998) Hirst, G. and St-Onge, D. Lexical chains as representations of context in the detection and correction of malaproprisms. WordNet: An electronic lexical database, MIT Press. (Jiang and Conrath 1997) Jiang, J. and Conrath, D. Semantic similarity based on corpus statisti

46、cs and lexical taxonomy. COLING 1997. (Krovetz, 1998) Krovetz, R. More than one sense per discourse. ACL-SIGLEX 1998. (Lesk, 1986) Lesk, M. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. SIGDOC 1986. (Lin 1998) Lin, D An informatio

47、n theoretic definition of similarity. ICML 1998.,References,(Martinez and Agirre 2000) Martinez, D. and Agirre, E. One sense per collocation and genre/topic variations. EMNLP 2000. (Miller et. al., 1994) Miller, G., Chodorow, M., Landes, S., Leacock, C., and Thomas, R. Using a semantic concordance f

48、or sense identification. ARPA Workshop 1994. (Miller, 1995) Miller, G. Wordnet: A lexical database. ACM, 38(11) 1995. (Mihalcea and Moldovan, 1999) Mihalcea, R. and Moldovan, D. A method for word sense disambiguation of unrestricted text. ACL 1999. (Mihalcea and Moldovan 2000) Mihalcea, R. and Moldo

49、van, D. An iterative approach to word sense disambiguation. FLAIRS 2000. (Mihalcea, Tarau, Figa 2004) R. Mihalcea, P. Tarau, E. Figa PageRank on Semantic Networks with Application to Word Sense Disambiguation, COLING 2004. (Patwardhan, Banerjee, and Pedersen 2003) Patwardhan, S. and Banerjee, S. and

50、 Pedersen, T. Using Measures of Semantic Relatedeness for Word Sense Disambiguation. CICLING 2003. (Rada et al 1989) Rada, R. and Mili, H. and Bicknell, E. and Blettner, M. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1) 1989. (Resn

51、ik 1993) Resnik, P. Selection and Information: A Class-Based Approach to Lexical Relationships. University of Pennsylvania 1993. (Resnik 1995) Resnik, P. Using information content to evaluate semantic similarity. IJCAI 1995. (Vasilescu, Langlais, Lapalme 2004) F. Vasilescu, P. Langlais, G. Lapalme E

52、valuating variants of the Lesk approach for disambiguating words”, LREC 2004. (Yarowsky, 1993) Yarowsky, D. One sense per collocation. ARPA Workshop 1993.,Part 4: Supervised Methods of Word Sense Disambiguation,Outline,What is Supervised Learning? Task Definition Single Classifiers Nave Bayesian Cla

53、ssifiers Decision Lists and Trees Ensembles of Classifiers,What is Supervised Learning?,Collect a set of examples that illustrate the various possible classifications or outcomes of an event. Identify patterns in the examples associated with each particular class of the event. Generalize those patte

54、rns into rules. Apply the rules to classify a new event.,Learn from these examples :“when do I go to the store?”,Learn from these examples :“when do I go to the store?”,Outline,What is Supervised Learning? Task Definition Single Classifiers Nave Bayesian Classifiers Decision Lists and Trees Ensemble

55、s of Classifiers,Task Definition,Supervised WSD: Class of methods that induces a classifier from manually sense-tagged text using machine learning techniques. Resources Sense Tagged Text Dictionary (implicit source of sense inventory) Syntactic Analysis (POS tagger, Chunker, Parser, ) Scope Typicall

56、y one target word per context Part of speech of target word resolved Lends itself to “targeted word” formulation Reduces WSD to a classification problem where a target word is assigned the most appropriate sense from a given set of possibilities based on the context in which it occurs,Sense Tagged T

57、ext,Two Bags of Words(Co-occurrences in the “window of context”),Simple Supervised Approach,Given a sentence S containing “bank”: For each word Wi in S If Wi is in FINANCIAL_BANK_BAG then Sense_1 = Sense_1 + 1; If Wi is in RIVER_BANK_BAG then Sense_2 = Sense_2 + 1; If Sense_1 Sense_2 then print “Fin

58、ancial” else if Sense_2 Sense_1 then print “River” else print “Cant Decide”;,Supervised Methodology,Create a sample of training data where a given target word is manually annotated with a sense from a predetermined set of possibilities. One tagged word per instance/lexical sample disambiguation Sele

59、ct a set of features with which to represent context. co-occurrences, collocations, POS tags, verb-obj relations, etc. Convert sense-tagged training instances to feature vectors. Apply a machine learning algorithm to induce a classifier. Form structure or relation among features Parameters strength of feature interactions Convert a held out sample of test data into feature vectors. “correct” sense tags are known but not used Apply classifier to test instances to assign a sense tag.,From Text to Feature Vectors,My/pronoun grandfather/noun used/verb to/prep fis

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论