




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、thesis onartificial intelligence for universal networking language (unl)(perspective bengali language)bydeen islam muslimid: 200720851ariful hoque tuhinid: 200710698shohanur rahmanid: 200720100 under the supervision ofmd. ahsan arif, sr. lecturerdept. of computer science and engineeringasian univers
2、ity of bangladesh artificial intelligence for universal networking language (unl)(perspective bengali language)deen islam muslim, ariful hoque tuhin, shohanur rahmamdepartment of computer science and engineeringasian university of bangladeshabstract:in this paper we present the computational analysi
3、s of the complex case structure of bengali- a member of the indo aryan family of languages- with a view toward interlingua based mt. bengali is ranked 4th in the list of languages ordered according to the size of the population that speaks the language. extremely interesting language phenomena invol
4、ving morphology, case structure, word order and word senses make the processing of bengali a worthwhile and challenging proposition. a recently proposed scheme called the universal networking language has been used as the interlingua. the approach is adaptable to other members of the vast indo aryan
5、 language family. the parallel development of both the analyzer and the generator system leads to an insightful intra-system verification process in place. our approach is rule based and makes use of authoritative treatises on bengali grammar and develop rules for certain bengali to unl conversion p
6、rocess. introduction:about 189 million people speak bengali and is ranked 4th in the world in terms of the number of people speaking the language (ref:/mhealy/g101ilec/intro/clt/cltclt/top100.html). like most languages in the indo aryan family, descende
7、d from sanskrit, bengali has the sov structure with some typical characteristics. a motivating factor for creating a system for processing bengali is the possibility of laying the framework for processing many other bengal languages too.work on indian language processing abounds. project anubaad 1 f
8、or machine translation from english to bengali in the newspaper domain uses the direct translation approach. angalabharati 2 system for english bengal machine translation is based on pattern directed rules for english, which generates a pseudo-target-language applicable to a group of indian language
9、s. in matra 3, a web based mt system for english to bengal in the newspaper domain, the input text is transformed into case-frame like structures and parameterized templates generate the target language. the mantra mt system for official documents uses tree adjoining grammar (tag) to achieve english
10、 bengal mt (ref: project anusaaraka 4 is a language accessor system rather than an mt system and addresses multiple indian languages. interlingua based mt for english, bengal and marathi 5 6, that uses the unl, transforms the source text into the unl representation and generates target text from thi
11、s intermediate representation. references to most of these works can also be found at .in/mat/ach-mat.htm. other famous mt systems are pivot 7, atlas 8, kant 9, aries 10, geta 11, systran 12 etc. the universal networking language (unl) () has been defin
12、ed as a digital meta language for describing, summarizing, refining, storing and disseminating information in a machine independent and human language neutral form. the information in a document is represented sentence by sentence. each sentence is converted into a directed hyper graph having concep
13、ts as nodes and relations as arcs. knowledge within a document is expressed in three dimensions: 1. word knowledge is expressed by universal words (uws), which are language independent. these uws are tagged using restrictions describing the sense of the word in the current context. for example, drin
14、k(icl > liquor) denotes the noun sense of drink restricting the sense to a type of liquor. here, icl stands for inclusion and forms an is-a relationship like in semantic nets.2. conceptual knowledge is captured by relating uws through a set of unl relations 14. for example, humans affect the envi
15、ronment is described in the unl asagt(affect(icl>do).present.entry, human(icl>animal).pl)obj(affect(icl>do).present.entry, environment(icl>abstract thing).pl)agt means the agent and obj the object. affect(icl > do), human(icl >animal) and environment(icl > abstract thing) are th
16、e uws denotingconcepts.3. speakers view, aspect, time of event, etc. are captured by unl attributes.for instance, in the above example, the attribute entry denotes the mainpredicate of the sentence, present the present tense and pl the pluralnumber.the above discussion can be summarized using the ex
17、ample below john, who is the chairman of the company, has arranged a meeting at his residencethe unl for the sentence is;= unl =mod(chairman(icl>post).present.def,company(icl>institution).def)aoj(chairman(icl>post).present.def, john(icl>person)agt(arrange(icl>do).plet
18、e, john(icl>person)pos(residence(icl>shelter), john(icl>person)obj(arrange(icl>do).plete, meeting(icl>event).indef)plc(arrange(icl>do).plete, residence(icl>shelter);=in the expressions above, agt denotes the agent relation, obj the object relati
19、on, plc the place relation, pos is the possessor relation, mod is the modifier relation and aoj is the attribute-of-the-object (used to express constructs like a is b) relation. the detailed specification of the universal networking language can be found at /unlsys.our work
20、is based on an authoritative treatise on bengali grammar. the strategies of analysis and generation of linguistic phenomena have been guided by rigorous grammatical principles.universal networking language based analysis and generation for bengalibangla-english dictionarybangla to english dictionary
21、 is the source of building a bangla to unl dictionary asuniversal words are english words mandated by unl. such dictionaries also provide allattributes along with the meaning of a word. any entry in the dictionary is put in thefollowing format:hw id “uw” (attribute1, attribute2 . . .) <flg, fre,
22、pri>here,hw < head word (bangla word)id < identification of head word (omitable)uw < universal wordattribute < attribute of the hwflg < language flagfre < frequency of head wordpri < priority of head wordsome example entries of dictionary for bangla are given below:shohor “ci
23、ty(icl>region)” (n, place) <b,0,0>prochur “huge(icl>big)” (adj) <b,0,0>here the attributes,n stands for nounplace stands for placeadj stands for adjectiveflg field entry is b which stands for banglaa universal knowledge base is defined in unl specification. this knowledge base isla
24、nguage independent and each native language word should be referenced to this knowledgebase. the knowledge base of universal words is a hierarch of concepts.en-converter and de-converter machinesthe en-converter (henceforth called enco) is a language-independent parser, a multi-headed turing machine
25、 providing a framework for morphological, syntactic and semantic analysis synchronously using the uw dictionary and analysis rules. the structure of the machine is shown in the figure 1.fig. 1. the enco machinethe machine has two types of heads- processing heads and context heads.the processing head
26、s (2 nos.) are called analysis windows (aw) and the context heads are called condition windows (cw). the machine traverses the sentence back and forth, retrieves the relevant universal words from the lexicon and, depending on the attributes of the nodes under the aws and those under the surrounding
27、cws, generates semantic relations between the uws and/or attaches speech act attributes to them. the final output is a set of unl expressions equivalent to a unl graph. the de-converter (henceforth called the deco) 18 is a language-independent generator that produces sentences from unl graphs (figur
28、e 2).fig. 2. the deco machinelike enco, deco too is a multi-headed turing machine. it does syntactic and morphological generation synchronously using the lexicon and the set of generation rules.existing problems:1) spell checking:by somehow, spell checker is not included with the current unl system.
29、 for that, there is a possibility to arising wrong output during en-conversion or de-conversion process. an example of such situation is given below: a simple english sentence in a right spelling form:i live in bangladeshaccording to 13, the final unl expression is as follows:aoj(live(icl>inhabit
30、>be,aoj>living_thing,plc>place).entry.present,i(icl>person)plc(live(icl>inhabit>be,aoj>living_thing,plc>place).entry.present,bangladesh(iof>asian_country>thing)where “bangladesh” is assigned to the uw of “asian_country”, which tells the de-converter to search “banglades
31、h” word in “asian_country” category. but if we if we type the “bangladesh” word in wrong spelling like “banladesh” then it convert that word in such form:aoj(live(icl>inhabit>be,aoj>living_thing,plc>place).entry.present,i(icl>person)plc(live(icl>inhabit>be,aoj>living_thing,pl
32、c>place).entry.present,banladesh)it does not define any uw for “bangladesh”. there for wrong conversion can be occurred. 2) maintaining exact grammatical pattern:according to 13, for a single sentence or like some compound sentences it works fine. but we have found some crucial problems when en-c
33、onverting and de-converting some multiple sentences.for an example, using this sentence:i like rice and i play football.en-converting this sentence in unl forms: aoj:01(like(icl>please>be,equ>enjoy,obj>uw,aoj>person).entry.present,i(icl>person):01)obj:01(like(icl>please>be,eq
34、u>enjoy,obj>uw,aoj>person).entry.present,rice(icl>grain>thing)agt:02(play(icl>compete>do,agt>thing,obj>uw,ptn>thing).entry.present,i(icl>person):02)obj:02(play(icl>compete>do,agt>thing,obj>uw,ptn>thing).entry.present,football(icl>field_game>thing
35、)and(:02,:01)after de-converting this unl form in english:i like rice and i play football.but, problem occurs when using some multiple sentences like:we play football. we try to win every match.en-converting this sentence in unl forms:agt(try(icl>attempt>do,agt>person,obj>uw).entry.prese
36、nt,we(icl>group):01.pl)pos:01(match(icl>contest>thing),we(icl>group):02)fictit(try(icl>attempt>do,agt>person,obj>uw).entry.present,every(icl>quantity,per>thing)obj:01(win(icl>prize>do,agt>thing,obj>thing,scn>thing).entry,match(icl>contest>thing)obj(
37、try(icl>attempt>do,agt>person,obj>uw).entry.present,:01)after de-converting this unl form back to english:we try win we match.more examplei am rahim. i am a student of asian university of bangladesh.en-converting this sentence in unl forms:aoj(rahim.entry.present,i(icl>person):01)fict
38、it(rahim.entry.present,i(icl>person):02)aoj(student(icl>university_student>person,obj>knowledge_domain).indef.present,i(icl>person):02)mod(university(icl>body>thing),asian(icl>adj,com>asia)obj(student(icl>university_student>person,obj>knowledge_domain).indef.prese
39、nt,university(icl>body>thing)obj(university(icl>body>thing),bangladesh(iof>asian_country>thing)after de-converting this unl form back to english:i be rahim. i a student of an asian university of bangladesh3) lack of exact rules and absence of ai: lets try some tricks with unl:my na
40、me is casper and i don't like to play football.according to 13, en-conversion takes this sentence in unl form like this way:pos(name(icl>language_unit>thing,pos>thing),i(icl>person):01)aoj(casper(iof>city>thing).entry.present,name(icl>language_unit>thing,pos>thing)and(
41、:01,casper(iof>city>thing).entry.present)aoj:01(like(icl>please>be,equ>enjoy,obj>uw,aoj>person).entry.not.present,i(icl>person):02)obj:01(like(icl>please>be,equ>enjoy,obj>uw,aoj>person).entry.not.present,play(icl>compete>do,agt>thing,obj>uw,ptn>t
42、hing)obj:01(play(icl>compete>do,agt>thing,obj>uw,ptn>thing),football(icl>field_game>thing)where name “casper” categorized under “city”, not as a person name. similarly for bangla to unl conversion we can give an example like:golapi ekhon aar train-e uthena.during the en-conversi
43、on, we may face the problem to identify whether golapi is a person name or a color name.4) verb representation problem:let take a look at this example:i am a good boy.en-converted unl form:aoj(boy(icl>child>person,ant>girl).entry.indef.present,i(icl>person)mod(boy(icl>child>person,
44、ant>girl).entry.indef.present,good(icl>adj,ant>bad)de-converted english:i be a good boy.another example:my name is karim and i like flowers.en-converted unl form:pos(name(icl>language_unit>thing,pos>thing),i(icl>person):01)aoj(kerim(icl>name,iof>person,com>male).entry.p
45、resent,name(icl>language_unit>thing,pos>thing)and(i(icl>person):02,karim(icl>name,iof>person,com>male).entry.present)man(kerim(icl>name,iof>person,com>male).entry.present,like(icl>how,obj>thing)man(i(icl>person):02,like(icl>how,obj>thing)obj(like(icl>h
46、ow,obj>thing),flower(icl>angiosperm>thing).pl)de-converted english:the name of i like is karim and i this like the flowers.recommended solution: 1) implementation of a good spell checker 2) rearrange the sentence in exact grammatical pattern3) implementation of ai and exact rules4) determin
47、e the exact form of verb representationconclusionssystematic analysis of the case structure forms the foundation for any natural language processing system. in this paper, we have described a system for the computational analysis of the bengali case structure for the purpose of interlingua based mt
48、using unl. the complementary generator system too has been implemented, which provides the platform for intra system verification. verification via cross system generation is being done using the bengal generation system (also under development.) apart from the case structure, computational analysis
49、 based on authoritative grammatical treatise, addressing complex phenomena involving verbs, adjectives and adverbs is under way.references:1 dey, k.: project anubaad: an english-bengali mt system. jadavpur university, kolkata (2001)2 sinha, r.: machine translation: the indian context. akshara94, new
50、 delhi (1994)3 rao, d., mohanraj, k., hedge, j., mehta, v., mahadane, p.: a practical framework for syntactic transfer of compound-complex sentences for english-bengal machine translation. (2000)4 bharati, a., chaitanya, v., sanyal, r.: natural language processing: a paninian perspective. prentice h
51、all india private limited (1996)5 dave, s., bhattacharya, p., girishbhai, p.j.: interlingua based english-bengal machine translation and language divergence. journal of machine translation, volume 17 (2002)6 monju, m., sachi, d., bhattacharyya, p.: knowledge extraction from bengal texts. knowledge b
52、ased computer systems, proceedings of the international conference kbcs2000 (2000)7 muraki, k.: pivot: two-phase machine translation system. mt summit manuscripts and program, pp. 81-83 (1987)8 uchida, h.: atlas. mt summit ii, pp. 152-157 (1989)9 lonsdale, d.w., franz, a.m., leavitt, j.r.r.: large-s
53、cale machine translation: an interlingua approach. (/research/kant/pdf/aei94.pdf)10 gonzlez, j.c., go, j.m., nieto, a.f.: aries: a ready for use platform for engineering spanish-processing tools. digest of the second language engineering convention, pages 219-226 (1995) 11 vauquois
54、, b., boitet, c.: automated translation at grenoble university. (/j/j85/j85-1003.pdf)12 w.john, h., l., s.h.: an introduction to machine translation. london: acamedic press (1992)13 russian and english language server, the russian unl language center, www.unl.ru.我的大学爱情观1、什么是大学爱情:大学是一个相对宽松,时间自由,自己支配的环境,也正因为这样,培植爱情之花最肥沃的土地。大学生恋爱一直是大学校园的热门话题,恋爱和学业也就自然成为了大学生在校期间面对的两个主要问题。恋爱关系处理得好、正确,健康,可以成为学习和事业的催化剂,使人学习努力、成绩上升;恋爱关系处理的不当,不健康,可能分散精力、浪费时间、情绪波动、成绩下
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025至2030年中国乳白静电膜市场调查研究报告
- 9《作息有规律》(教学设计)2024-2025学年统编版(2024)道德与法治一年级上册
- 2024年小自考汉语言文学大纲解析试题及答案
- 食品安全员考试经典知识点试题与答案
- 2025至2030年中国GPS支架行业投资前景及策略咨询报告
- 第一章第一节地球的宇宙环境教学设计 -2024-2025学年人教版地理七年级上册
- 高中语文 第三单元 第8课 兰亭集序教学设计1 新人教版必修2
- 2024年电力行业常见问题试题及答案
- 2024-2025学年高中物理 第十三章 光 3 光的干涉教学设计2 新人教版选修3-4
- 小自考公共事业管理考试测验试题及答案
- 事故隐患报告和奖励制度
- 新建项目员工四新培训
- 试岗期七天试岗协议书范文
- 2024年彩色锆石项目可行性研究报告
- DB3402T 59-2023 露天矿山无人驾驶矿车作业通 用要求
- 人教版四年级下册音乐全册表格式教案(集体备课)
- 西方文论概览(第二版)-第六章课件
- 初中语文教材常见问题答疑(八年级)
- 2024燃煤机组锅炉水冷壁高温腐蚀防治技术导则
- 2024北京一零一中初三(下)英语月考试卷和答案
- 2025届高考语文复习:语言文字运用之句子的表达效果+课件
评论
0/150
提交评论