版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、GATE IntroductionGATE is an open source software written in JAVA and mainly used to extract the information from the unstructured data. GATE components mainly support the following capabilities,· Natural language processing (NLP).· Information extraction in many languages.GATE components:G
2、ATE mainly acts on the textual resources namely word document, text document, HTML, PDF and XML. It provides the different types of built in plugins. Its easy to start using the GATE from the GATE developer GUI. The following are the GATE components.· Language Resources (LRs) Corpus (set of doc
3、uments), document, annotations· Processing Resources (PRs) ANNIE ( GATE plugins)· Application Combination of both Language resources and processing resources. Sequence of processing resources is applied on the language resources. It is also called as Visual resources.Use caseLets have a us
4、e case using GATE. The use case is how to use the GATE developer GUI and how to create a sample GATE application to make use of the GATE plug-ins to extract the meaningful information from the unstructured content.Pre requisite:· Install JDK 1.6+· Install GATE software (GATE 7.1) http:/gat
5、e.ac.uk/download/Solution:To handle this use case the following plugins are used to show the functionality of the GATE system.ANNIE plugin: This plugin supports the following functionalities.· Default tokenzier - tokenzie the sentence· Sentence splitter Splits the sentence
6、 based on the punctuation.· Gazetter lookup Number Tagger Plugin:· This plugin is used to find the numbers in both numeric and digits, it annotates them with their numeric values.GATE Developer GUI:Launch the GATE application from the installation directory.STEPS:· Select the Fil
7、e tab menu. Click on the “Manage plugins”. This is used to load the required plugins. Load the ANNIE plugin, JAPE plus Transducer plugin and Tagger_Number plugin.· Select the language resources and create the GATE document.· Select the processing resources and load the ANNIE sentence
8、 splitter and english tokenzier. Now load the Number tagger processing resource.· Select the application option and create sample pipeline application. Add the processing resources, then select the language resources and run the application.Annotation sets and annotation list highlights the out
9、put from the plugins. Number tagger plugins creates the “Number annotation” feature. It is used to annotate the number in both words and numbers in numeric values.GATE Embedded Sample:Create sample java project as given below specified in the screenshot.GATE jars:Copy the gate.jar from the bin
10、 directory ($GATE_HOMEbin) and copy NumbersTagger jar from the $GATE_HOMEpluginsTagger_Numbers respectively. The other supporting jars will be available in the lib folder of the GATE installation directory ($GATE_HOMElib).· Steps:Initialize the GATE system. It will be properly achieved by creat
11、ing the GATE_HOME environment variable. Create the system variable GATE_HOME and refer to the GATE installation directory.GATEMain.java123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
12、888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136package com.example.gate.main; import gate.Annotation;import gate.AnnotationSet;import gate.Factory;import gate.Gate;import gate.creole.ANNIEConstants;import ga
13、te.creole.SerialAnalyserController;import gate.creole.numbers.AnnotationConstants;import gate.util.GateException; import java.io.BufferedReader;import java.io.File;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.math.BigDecimal;import .Malfo
14、rmedURLException;import java.util.Iterator;import java.util.TreeSet; public class GateMain private SerialAnalyserController controller; private String inputASname; /* * Main method to execu
15、te simple GATE application * * param args Jun 29, 2014 */ public static void main(String args) GateMain m = new GateMain();
16、60; m.initializeGate(); cess(); /* * This method is used to add the documents to the GATE corpus and get the * values from the annotatio
17、n based on the annotation name. * * Jun 29, 2014 */ private void process() try Syste
18、m.out.println("Reading the input files and adding the document to the corpus ."); String input = readInput(); System.out.println("Input."
19、+ input); controller.getCorpus().add(Factory.newDocument(input); System.out.println("GATE controller executing.");
20、0; controller.execute(); Iterator<gate.Document> documentIterator = controller.getCorpus().iterator(); /iterati
21、ng the document features while (documentIterator.hasNext() gate.Document currDoc = (gate.Document) documentIterator.next();
22、0; gate.Document doc = currDoc; doc.getFeatures().clear();
23、0; AnnotationSet inputAnnSet = (inputASname = null | inputASname.length() = 0) ? doc.getAnnotations() : doc.getAnnotations(input
24、ASname); Iterator<Annotation> numberIterator = inputAnnSet.get(AnnotationConstants.NUMBER_ANNOTATION_NAME).iterator();
25、60; /iterating the annotation features while (numberIterator.hasNext()
26、60;Annotation numberAnnotation = (Annotation) numberIterator.next(); BigDecimal bd = new BigDecimal(String.valueOf(numberAnnotation.getFeatures().get(AnnotationConstants.VALUE_FEATU
27、RE_NAME); System.out.println("Number tagger Plugin output: " + bd.longValue();
28、60; catch (Exception e) e.printStackTrace();
29、0; /* * This method is used to read the input from input directory. * * return Jun 29, 2014 * throws IOException */
30、;private String readInput() throws IOException String line = "" StringBuilder fileContent = new StringBuilder(); StringBuilder input = new StringBui
31、lder(); input.append(System.getProperty("user.dir"); input.append(File.separator).append("input"); input.append(File.separator).append(&quo
32、t;input.txt"); FileReader reader = new FileReader(new File(input.toString(); BufferedReader br = new BufferedReader(reader); while (line = br.readLi
33、ne() != null) fileContent.append(line).append("n"); return fileContent.toString();
34、; /* * This method is used to initialize the GATE. * * STEP1: Initialize the GATE home. STEP2: Initialize the GATE controller * STEP3: Register the processing resources.
35、; * * Jun 29, 2014 */ SuppressWarnings("deprecation") private void initializeGate() try
36、; System.out.println("Initializing the GATE ."); Gate.setGateHome(new File(System.getenv("GATE_HOME"); Gate.init();
37、 String taggerNumber = Gate.getPluginsHome() + File.separator + "Tagger_Numbers" String japeHome = Gate.getPluginsHome() + File.separator + "JAPE_Plu
38、s" /registering the processing resources. System.out.println("Registering the resources .");
39、60; Gate.getCreoleRegister().registerDirectories(new File(japeHome).toURL(); Gate.getCreoleRegister().registerDirectories(new File(taggerNumber).toURL();
40、; controller = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController"); String processingResources =
41、160; "gate.creole.tokeniser.DefaultTokeniser", "gate.creole.splitter.SentenceSplitter", "gate.creole.numbers.NumbersTagger" ; /adding the processing to the GATE controller for (int pr = 0 ; pr < processingResources.length ; pr+) &
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025四川二滩建设咨询有限公司应届生招聘50人笔试参考题库附带答案详解
- 2025四川九强通信科技有限公司招聘质检员等岗位53人笔试参考题库附带答案详解
- 财务分析与报告编制手册
- 2025北京燃气集团校园招聘30人笔试历年难易错考点试卷带答案解析2套试卷
- 2025包头一机集团招聘12人笔试参考题库附带答案详解
- 2025内蒙古鄂尔多斯市伊金霍洛旗城投享保安服务有限责任公司招聘50人笔试历年难易错考点试卷带答案解析2套试卷
- 2025中核集团中核西仪社会招聘笔试参考题库附带答案详解
- 2025中国电气招聘平高集团平高电气校园招聘笔试历年典型考点题库附带答案详解
- 2025年南充电影工业职业学院单招职业倾向性测试题库附答案解析
- 航空安全培训教材
- 2026年及未来5年市场数据中国机械式停车设备行业市场全景分析及投资战略规划报告
- 泥浆压滤施工方案(3篇)
- 李时珍存世墨迹初探──《李濒湖抄医书》的考察
- 肺源性心脏病诊疗指南(2025年版)
- 医院行风建设培训会课件
- 非药品类易制毒化学品经营企业年度自查细则
- 太阳能建筑一体化原理与应 课件 第5章 太阳能集热器
- 住院患者节前安全宣教
- 2026春人教版英语八下单词表(先鸟版)
- 汽车装潢贴膜合同范本
- 签字版离婚协议书范本
评论
0/150
提交评论