




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、GATE IntroductionGATE is an open source software written in JAVA and mainly used to extract the information from the unstructured data. GATE components mainly support the following capabilities,· Natural language processing (NLP).· Information extraction in many languages.GATE components:G
2、ATE mainly acts on the textual resources namely word document, text document, HTML, PDF and XML. It provides the different types of built in plugins. Its easy to start using the GATE from the GATE developer GUI. The following are the GATE components.· Language Resources (LRs) Corpus (set of doc
3、uments), document, annotations· Processing Resources (PRs) ANNIE ( GATE plugins)· Application Combination of both Language resources and processing resources. Sequence of processing resources is applied on the language resources. It is also called as Visual resources.Use caseLets have a us
4、e case using GATE. The use case is how to use the GATE developer GUI and how to create a sample GATE application to make use of the GATE plug-ins to extract the meaningful information from the unstructured content.Pre requisite:· Install JDK 1.6+· Install GATE software (GATE 7.1) http:/gat
5、e.ac.uk/download/Solution:To handle this use case the following plugins are used to show the functionality of the GATE system.ANNIE plugin: This plugin supports the following functionalities.· Default tokenzier - tokenzie the sentence· Sentence splitter Splits the sentence
6、 based on the punctuation.· Gazetter lookup Number Tagger Plugin:· This plugin is used to find the numbers in both numeric and digits, it annotates them with their numeric values.GATE Developer GUI:Launch the GATE application from the installation directory.STEPS:· Select the Fil
7、e tab menu. Click on the “Manage plugins”. This is used to load the required plugins. Load the ANNIE plugin, JAPE plus Transducer plugin and Tagger_Number plugin.· Select the language resources and create the GATE document.· Select the processing resources and load the ANNIE sentence
8、 splitter and english tokenzier. Now load the Number tagger processing resource.· Select the application option and create sample pipeline application. Add the processing resources, then select the language resources and run the application.Annotation sets and annotation list highlights the out
9、put from the plugins. Number tagger plugins creates the “Number annotation” feature. It is used to annotate the number in both words and numbers in numeric values.GATE Embedded Sample:Create sample java project as given below specified in the screenshot.GATE jars:Copy the gate.jar from the bin
10、 directory ($GATE_HOMEbin) and copy NumbersTagger jar from the $GATE_HOMEpluginsTagger_Numbers respectively. The other supporting jars will be available in the lib folder of the GATE installation directory ($GATE_HOMElib).· Steps:Initialize the GATE system. It will be properly achieved by creat
11、ing the GATE_HOME environment variable. Create the system variable GATE_HOME and refer to the GATE installation directory.GATEMain.java123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
12、888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136package com.example.gate.main; import gate.Annotation;import gate.AnnotationSet;import gate.Factory;import gate.Gate;import gate.creole.ANNIEConstants;import ga
13、te.creole.SerialAnalyserController;import gate.creole.numbers.AnnotationConstants;import gate.util.GateException; import java.io.BufferedReader;import java.io.File;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.math.BigDecimal;import .Malfo
14、rmedURLException;import java.util.Iterator;import java.util.TreeSet; public class GateMain private SerialAnalyserController controller; private String inputASname; /* * Main method to execu
15、te simple GATE application * * param args Jun 29, 2014 */ public static void main(String args) GateMain m = new GateMain();
16、60; m.initializeGate(); cess(); /* * This method is used to add the documents to the GATE corpus and get the * values from the annotatio
17、n based on the annotation name. * * Jun 29, 2014 */ private void process() try Syste
18、m.out.println("Reading the input files and adding the document to the corpus ."); String input = readInput(); System.out.println("Input."
19、+ input); controller.getCorpus().add(Factory.newDocument(input); System.out.println("GATE controller executing.");
20、0; controller.execute(); Iterator<gate.Document> documentIterator = controller.getCorpus().iterator(); /iterati
21、ng the document features while (documentIterator.hasNext() gate.Document currDoc = (gate.Document) documentIterator.next();
22、0; gate.Document doc = currDoc; doc.getFeatures().clear();
23、0; AnnotationSet inputAnnSet = (inputASname = null | inputASname.length() = 0) ? doc.getAnnotations() : doc.getAnnotations(input
24、ASname); Iterator<Annotation> numberIterator = inputAnnSet.get(AnnotationConstants.NUMBER_ANNOTATION_NAME).iterator();
25、60; /iterating the annotation features while (numberIterator.hasNext()
26、60;Annotation numberAnnotation = (Annotation) numberIterator.next(); BigDecimal bd = new BigDecimal(String.valueOf(numberAnnotation.getFeatures().get(AnnotationConstants.VALUE_FEATU
27、RE_NAME); System.out.println("Number tagger Plugin output: " + bd.longValue();
28、60; catch (Exception e) e.printStackTrace();
29、0; /* * This method is used to read the input from input directory. * * return Jun 29, 2014 * throws IOException */
30、;private String readInput() throws IOException String line = "" StringBuilder fileContent = new StringBuilder(); StringBuilder input = new StringBui
31、lder(); input.append(System.getProperty("user.dir"); input.append(File.separator).append("input"); input.append(File.separator).append(&quo
32、t;input.txt"); FileReader reader = new FileReader(new File(input.toString(); BufferedReader br = new BufferedReader(reader); while (line = br.readLi
33、ne() != null) fileContent.append(line).append("n"); return fileContent.toString();
34、; /* * This method is used to initialize the GATE. * * STEP1: Initialize the GATE home. STEP2: Initialize the GATE controller * STEP3: Register the processing resources.
35、; * * Jun 29, 2014 */ SuppressWarnings("deprecation") private void initializeGate() try
36、; System.out.println("Initializing the GATE ."); Gate.setGateHome(new File(System.getenv("GATE_HOME"); Gate.init();
37、 String taggerNumber = Gate.getPluginsHome() + File.separator + "Tagger_Numbers" String japeHome = Gate.getPluginsHome() + File.separator + "JAPE_Plu
38、s" /registering the processing resources. System.out.println("Registering the resources .");
39、60; Gate.getCreoleRegister().registerDirectories(new File(japeHome).toURL(); Gate.getCreoleRegister().registerDirectories(new File(taggerNumber).toURL();
40、; controller = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController"); String processingResources =
41、160; "gate.creole.tokeniser.DefaultTokeniser", "gate.creole.splitter.SentenceSplitter", "gate.creole.numbers.NumbersTagger" ; /adding the processing to the GATE controller for (int pr = 0 ; pr < processingResources.length ; pr+) &
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2020-2021北京首都医科大学附属中学初中部小学三年级数学下期末一模试卷及答案
- 大修施工方案
- 2024年广西地区中考满分作文《给小广的一封信》7
- 加强学生团队精神的培养计划
- 建立临床路径的实施方案计划
- 跌倒护理创新案例
- 年度创新项目管理与评估计划
- 面部危险与清洁护理
- 商场安全防范工作计划
- 《金沙县平坝尖山铁矿有限公司金沙县平坝乡尖山铁矿(延续)矿产资源绿色开发利用方案(三合一)》评审意见
- 2025年江西江铜集团招聘笔试参考题库含答案解析
- 大学英语翻译课件
- 薄膜电容项目立项申请报告
- 2023新修订版《中华人民共和国公司法》学习解读
- 2024年砂石洗沙厂厂安全生产管理制度及岗位责任(2篇)
- 教师师德师风考核细则
- 声带肿物的护理教学查房
- 办公场所修缮施工方案
- 物联网在安全生产中的应用
- 产科临床诊疗指南及操作规范
- DB32T-网络直播营销管理规范编制说明
评论
0/150
提交评论