版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、GATE IntroductionGATE is an open source software written in JAVA and mainly used to extract the information from the unstructured data. GATE components mainly support the following capabilities,· Natural language processing (NLP).· Information extraction in many languages.GATE components:G
2、ATE mainly acts on the textual resources namely word document, text document, HTML, PDF and XML. It provides the different types of built in plugins. Its easy to start using the GATE from the GATE developer GUI. The following are the GATE components.· Language Resources (LRs) Corpus (set of doc
3、uments), document, annotations· Processing Resources (PRs) ANNIE ( GATE plugins)· Application Combination of both Language resources and processing resources. Sequence of processing resources is applied on the language resources. It is also called as Visual resources.Use caseLets have a us
4、e case using GATE. The use case is how to use the GATE developer GUI and how to create a sample GATE application to make use of the GATE plug-ins to extract the meaningful information from the unstructured content.Pre requisite:· Install JDK 1.6+· Install GATE software (GATE 7.1) http:/gat
5、e.ac.uk/download/Solution:To handle this use case the following plugins are used to show the functionality of the GATE system.ANNIE plugin: This plugin supports the following functionalities.· Default tokenzier - tokenzie the sentence· Sentence splitter Splits the sentence
6、 based on the punctuation.· Gazetter lookup Number Tagger Plugin:· This plugin is used to find the numbers in both numeric and digits, it annotates them with their numeric values.GATE Developer GUI:Launch the GATE application from the installation directory.STEPS:· Select the Fil
7、e tab menu. Click on the “Manage plugins”. This is used to load the required plugins. Load the ANNIE plugin, JAPE plus Transducer plugin and Tagger_Number plugin.· Select the language resources and create the GATE document.· Select the processing resources and load the ANNIE sentence
8、 splitter and english tokenzier. Now load the Number tagger processing resource.· Select the application option and create sample pipeline application. Add the processing resources, then select the language resources and run the application.Annotation sets and annotation list highlights the out
9、put from the plugins. Number tagger plugins creates the “Number annotation” feature. It is used to annotate the number in both words and numbers in numeric values.GATE Embedded Sample:Create sample java project as given below specified in the screenshot.GATE jars:Copy the gate.jar from the bin
10、 directory ($GATE_HOMEbin) and copy NumbersTagger jar from the $GATE_HOMEpluginsTagger_Numbers respectively. The other supporting jars will be available in the lib folder of the GATE installation directory ($GATE_HOMElib).· Steps:Initialize the GATE system. It will be properly achieved by creat
11、ing the GATE_HOME environment variable. Create the system variable GATE_HOME and refer to the GATE installation directory.GATEMain.java123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687
12、888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136package com.example.gate.main; import gate.Annotation;import gate.AnnotationSet;import gate.Factory;import gate.Gate;import gate.creole.ANNIEConstants;import ga
13、te.creole.SerialAnalyserController;import gate.creole.numbers.AnnotationConstants;import gate.util.GateException; import java.io.BufferedReader;import java.io.File;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.math.BigDecimal;import .Malfo
14、rmedURLException;import java.util.Iterator;import java.util.TreeSet; public class GateMain private SerialAnalyserController controller; private String inputASname; /* * Main method to execu
15、te simple GATE application * * param args Jun 29, 2014 */ public static void main(String args) GateMain m = new GateMain();
16、60; m.initializeGate(); cess(); /* * This method is used to add the documents to the GATE corpus and get the * values from the annotatio
17、n based on the annotation name. * * Jun 29, 2014 */ private void process() try Syste
18、m.out.println("Reading the input files and adding the document to the corpus ."); String input = readInput(); System.out.println("Input."
19、+ input); controller.getCorpus().add(Factory.newDocument(input); System.out.println("GATE controller executing.");
20、0; controller.execute(); Iterator<gate.Document> documentIterator = controller.getCorpus().iterator(); /iterati
21、ng the document features while (documentIterator.hasNext() gate.Document currDoc = (gate.Document) documentIterator.next();
22、0; gate.Document doc = currDoc; doc.getFeatures().clear();
23、0; AnnotationSet inputAnnSet = (inputASname = null | inputASname.length() = 0) ? doc.getAnnotations() : doc.getAnnotations(input
24、ASname); Iterator<Annotation> numberIterator = inputAnnSet.get(AnnotationConstants.NUMBER_ANNOTATION_NAME).iterator();
25、60; /iterating the annotation features while (numberIterator.hasNext()
26、60;Annotation numberAnnotation = (Annotation) numberIterator.next(); BigDecimal bd = new BigDecimal(String.valueOf(numberAnnotation.getFeatures().get(AnnotationConstants.VALUE_FEATU
27、RE_NAME); System.out.println("Number tagger Plugin output: " + bd.longValue();
28、60; catch (Exception e) e.printStackTrace();
29、0; /* * This method is used to read the input from input directory. * * return Jun 29, 2014 * throws IOException */
30、;private String readInput() throws IOException String line = "" StringBuilder fileContent = new StringBuilder(); StringBuilder input = new StringBui
31、lder(); input.append(System.getProperty("user.dir"); input.append(File.separator).append("input"); input.append(File.separator).append(&quo
32、t;input.txt"); FileReader reader = new FileReader(new File(input.toString(); BufferedReader br = new BufferedReader(reader); while (line = br.readLi
33、ne() != null) fileContent.append(line).append("n"); return fileContent.toString();
34、; /* * This method is used to initialize the GATE. * * STEP1: Initialize the GATE home. STEP2: Initialize the GATE controller * STEP3: Register the processing resources.
35、; * * Jun 29, 2014 */ SuppressWarnings("deprecation") private void initializeGate() try
36、; System.out.println("Initializing the GATE ."); Gate.setGateHome(new File(System.getenv("GATE_HOME"); Gate.init();
37、 String taggerNumber = Gate.getPluginsHome() + File.separator + "Tagger_Numbers" String japeHome = Gate.getPluginsHome() + File.separator + "JAPE_Plu
38、s" /registering the processing resources. System.out.println("Registering the resources .");
39、60; Gate.getCreoleRegister().registerDirectories(new File(japeHome).toURL(); Gate.getCreoleRegister().registerDirectories(new File(taggerNumber).toURL();
40、; controller = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController"); String processingResources =
41、160; "gate.creole.tokeniser.DefaultTokeniser", "gate.creole.splitter.SentenceSplitter", "gate.creole.numbers.NumbersTagger" ; /adding the processing to the GATE controller for (int pr = 0 ; pr < processingResources.length ; pr+) &
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 长春科技学院《道路勘测设计》2025-2026学年期末试卷
- 江西中医药大学《文学概论》2025-2026学年期末试卷
- 黎明职业大学《报关实务》2025-2026学年期末试卷
- 长治学院《新闻写作教程》2025-2026学年期末试卷
- 长春职业技术大学《治安学》2025-2026学年期末试卷
- 福建理工大学《中国工艺美术史》2025-2026学年期末试卷
- 肺部健康养护指南
- 石英玻璃制品加工工岗前安全检查考核试卷含答案
- 外贸公司单证操作制度
- 燃气具零部件制作工安全培训效果强化考核试卷含答案
- 七步洗手法交互课件
- 蚊虫叮咬教学课件
- DB13T 2055-2014 学校安全管理规范
- 萨满文化课件
- 2025年湖南省郴州市初中学业水平考试第二次监测数学试卷(原卷版+解析版)
- 2025厌氧好氧缺氧(AOA)活性污泥法设计标准
- 电解液管理制度
- 新译林版英语七年级下册Unit 3 Integration A-C课件
- 南大版一年级心理健康第5课《校园“红绿灯”》课件
- 曲靖市灵活就业社会保险补贴申报审核表
- DB34T 4442.3-2023 煤矿水害防治 第3部分:地面区域治理
评论
0/150
提交评论