版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、GATE IntroductionGATE is an open source software written in JAVA and mainly used to extract the information from the unstructured data. GATE components mainly support the following capabilities,· Natural language processing (NLP).· Information extraction in many languages.GATE components:G
2、ATE mainly acts on the textual resources namely word document, text document, HTML, PDF and XML. It provides the different types of built in plugins. Its easy to start using the GATE from the GATE developer GUI. The following are the GATE components.· Language Resources (LRs) Corpus (set of doc
3、uments), document, annotations· Processing Resources (PRs) ANNIE ( GATE plugins)· Application Combination of both Language resources and processing resources. Sequence of processing resources is applied on the language resources. It is also called as Visual resources.Use caseLets have a us
4、e case using GATE. The use case is how to use the GATE developer GUI and how to create a sample GATE application to make use of the GATE plug-ins to extract the meaningful information from the unstructured content.Pre requisite:· Install JDK 1.6+· Install GATE software (GATE 7.1) http:/gat
5、e.ac.uk/download/Solution:To handle this use case the following plugins are used to show the functionality of the GATE system.ANNIE plugin: This plugin supports the following functionalities.· Default tokenzier - tokenzie the sentence· Sentence splitter Splits the sentence
6、 based on the punctuation.· Gazetter lookup Number Tagger Plugin:· This plugin is used to find the numbers in both numeric and digits, it annotates them with their numeric values.GATE Developer GUI:Launch the GATE application from the installation directory.STEPS:· Select the Fil
7、e tab menu. Click on the “Manage plugins”. This is used to load the required plugins. Load the ANNIE plugin, JAPE plus Transducer plugin and Tagger_Number plugin.· Select the language resources and create the GATE document.· Select the processing resources and load the ANNIE sentence
8、 splitter and english tokenzier. Now load the Number tagger processing resource.· Select the application option and create sample pipeline application. Add the processing resources, then select the language resources and run the application.Annotation sets and annotation list highlights the out
9、put from the plugins. Number tagger plugins creates the “Number annotation” feature. It is used to annotate the number in both words and numbers in numeric values.GATE Embedded Sample:Create sample java project as given below specified in the screenshot.GATE jars:Copy the gate.jar from the bin
10、 directory ($GATE_HOMEbin) and copy NumbersTagger jar from the $GATE_HOMEpluginsTagger_Numbers respectively. The other supporting jars will be available in the lib folder of the GATE installation directory ($GATE_HOMElib).· Steps:Initialize the GATE system. It will be properly achieved by creat
11、ing the GATE_HOME environment variable. Create the system variable GATE_HOME and refer to the GATE installation directory.GATEMain.java123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
12、888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136package com.example.gate.main; import gate.Annotation;import gate.AnnotationSet;import gate.Factory;import gate.Gate;import gate.creole.ANNIEConstants;import ga
13、te.creole.SerialAnalyserController;import gate.creole.numbers.AnnotationConstants;import gate.util.GateException; import java.io.BufferedReader;import java.io.File;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.math.BigDecimal;import .Malfo
14、rmedURLException;import java.util.Iterator;import java.util.TreeSet; public class GateMain private SerialAnalyserController controller; private String inputASname; /* * Main method to execu
15、te simple GATE application * * param args Jun 29, 2014 */ public static void main(String args) GateMain m = new GateMain();
16、60; m.initializeGate(); cess(); /* * This method is used to add the documents to the GATE corpus and get the * values from the annotatio
17、n based on the annotation name. * * Jun 29, 2014 */ private void process() try Syste
18、m.out.println("Reading the input files and adding the document to the corpus ."); String input = readInput(); System.out.println("Input."
19、+ input); controller.getCorpus().add(Factory.newDocument(input); System.out.println("GATE controller executing.");
20、0; controller.execute(); Iterator<gate.Document> documentIterator = controller.getCorpus().iterator(); /iterati
21、ng the document features while (documentIterator.hasNext() gate.Document currDoc = (gate.Document) documentIterator.next();
22、0; gate.Document doc = currDoc; doc.getFeatures().clear();
23、0; AnnotationSet inputAnnSet = (inputASname = null | inputASname.length() = 0) ? doc.getAnnotations() : doc.getAnnotations(input
24、ASname); Iterator<Annotation> numberIterator = inputAnnSet.get(AnnotationConstants.NUMBER_ANNOTATION_NAME).iterator();
25、60; /iterating the annotation features while (numberIterator.hasNext()
26、60;Annotation numberAnnotation = (Annotation) numberIterator.next(); BigDecimal bd = new BigDecimal(String.valueOf(numberAnnotation.getFeatures().get(AnnotationConstants.VALUE_FEATU
27、RE_NAME); System.out.println("Number tagger Plugin output: " + bd.longValue();
28、60; catch (Exception e) e.printStackTrace();
29、0; /* * This method is used to read the input from input directory. * * return Jun 29, 2014 * throws IOException */
30、;private String readInput() throws IOException String line = "" StringBuilder fileContent = new StringBuilder(); StringBuilder input = new StringBui
31、lder(); input.append(System.getProperty("user.dir"); input.append(File.separator).append("input"); input.append(File.separator).append(&quo
32、t;input.txt"); FileReader reader = new FileReader(new File(input.toString(); BufferedReader br = new BufferedReader(reader); while (line = br.readLi
33、ne() != null) fileContent.append(line).append("n"); return fileContent.toString();
34、; /* * This method is used to initialize the GATE. * * STEP1: Initialize the GATE home. STEP2: Initialize the GATE controller * STEP3: Register the processing resources.
35、; * * Jun 29, 2014 */ SuppressWarnings("deprecation") private void initializeGate() try
36、; System.out.println("Initializing the GATE ."); Gate.setGateHome(new File(System.getenv("GATE_HOME"); Gate.init();
37、 String taggerNumber = Gate.getPluginsHome() + File.separator + "Tagger_Numbers" String japeHome = Gate.getPluginsHome() + File.separator + "JAPE_Plu
38、s" /registering the processing resources. System.out.println("Registering the resources .");
39、60; Gate.getCreoleRegister().registerDirectories(new File(japeHome).toURL(); Gate.getCreoleRegister().registerDirectories(new File(taggerNumber).toURL();
40、; controller = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController"); String processingResources =
41、160; "gate.creole.tokeniser.DefaultTokeniser", "gate.creole.splitter.SentenceSplitter", "gate.creole.numbers.NumbersTagger" ; /adding the processing to the GATE controller for (int pr = 0 ; pr < processingResources.length ; pr+) &
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 哈尔滨市第六中学2025届高三物理第一学期期中统考模拟试题含解析
- 2025届湖南省醴陵市第二中学高三物理第一学期期末预测试题含解析
- 2025届江苏南京市物理高一上期中达标测试试题含解析
- 2025届安徽省长丰县朱巷中学高二物理第一学期期末教学质量检测试题含解析
- 2025届宁夏长庆中学物理高一第一学期期末质量检测模拟试题含解析
- 广东省广州市越秀区2025届物理高二上期末监测模拟试题含解析
- 2025届江苏省淮安市四校高二物理第一学期期中质量检测模拟试题含解析
- 四川省宜宾市南溪区第三初级中学2025届高三物理第一学期期末学业质量监测试题含解析
- 辽宁省锦州市(2024年-2025年小学五年级语文)统编版课后作业((上下)学期)试卷及答案
- 悠闲之车中自驾游探险
- 养老院 入住申请表
- 初中数学人教七年级上册 一元一次方程实际问题与一元一次方程-销售盈亏问题
- 西方经济学导论全套课件
- 树立正确的人生观
- 【审计工作底稿模板】SA营业收入
- 2022年《学习有方法教案》初中心理健康教育鲁画报社版六年级全一册教案
- 中学生安全教育优质实用课件(共54张PPT)
- (完整版)霍兰德职业兴趣测试量表及答案.docx
- 怡安翰威特:高潜人才标准构建技术与案例分享课件
- 《糖尿病足的治疗》PPT课件
- 牛津自然拼读Oxford Phonics WorldLevel1Unit1 lesson1课件
评论
0/150
提交评论