GATEIntroduction_第1页
GATEIntroduction_第2页
GATEIntroduction_第3页
GATEIntroduction_第4页
GATEIntroduction_第5页
已阅读5页,还剩4页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、GATE IntroductionGATE is an open source software written in JAVA and mainly used to extract the information from the unstructured data. GATE components mainly support the following capabilities,· Natural language processing (NLP).· Information extraction in many languages.GATE components:G

2、ATE mainly acts on the textual resources namely word document, text document, HTML, PDF and XML. It provides the different types of built in plugins. Its easy to start using the GATE from the GATE developer GUI. The following are the GATE components.· Language Resources (LRs) Corpus (set of doc

3、uments), document, annotations· Processing Resources (PRs) ANNIE ( GATE plugins)· Application Combination of both Language resources and processing resources. Sequence of processing resources is applied on the language resources. It is also called as Visual resources.Use caseLets have a us

4、e case using GATE. The use case is how to use the GATE developer GUI and how to create a sample GATE application to make use of the GATE plug-ins to extract the meaningful information from the unstructured content.Pre requisite:· Install JDK 1.6+· Install GATE software (GATE 7.1) http:/gat

5、e.ac.uk/download/Solution:To handle this use case the following plugins are used to show the functionality of the GATE system.ANNIE plugin:  This plugin supports the following functionalities.· Default tokenzier -  tokenzie the sentence· Sentence splitter Splits the sentence

6、 based on the punctuation.· Gazetter lookup Number Tagger Plugin:· This plugin is used to find the numbers in both numeric and digits, it annotates them with their numeric values.GATE Developer GUI:Launch the GATE application from the installation directory.STEPS:· Select the Fil

7、e tab menu. Click on the “Manage plugins”. This is used to load the required plugins. Load the ANNIE plugin, JAPE plus Transducer plugin and  Tagger_Number plugin.· Select the language resources and create the GATE document.· Select the processing resources and load the ANNIE sentence

8、 splitter and english tokenzier. Now load the Number tagger processing resource.· Select the application option and create sample pipeline application. Add the processing resources, then select the language resources and run the application.Annotation sets and annotation list highlights the out

9、put from the plugins. Number tagger plugins creates the “Number annotation” feature. It is used to annotate the number in both words and numbers in numeric values.GATE Embedded Sample:Create sample java project as given below specified in the screenshot.GATE  jars:Copy the gate.jar from the bin

10、 directory ($GATE_HOMEbin) and copy NumbersTagger jar from the $GATE_HOMEpluginsTagger_Numbers respectively. The other supporting jars will be available in the lib folder of the GATE installation directory ($GATE_HOMElib).· Steps:Initialize the GATE system. It will be properly achieved by creat

11、ing the GATE_HOME environment variable. Create the system variable GATE_HOME and refer to the GATE installation directory.GATEMain.java123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687

12、888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136package com.example.gate.main; import gate.Annotation;import gate.AnnotationSet;import gate.Factory;import gate.Gate;import gate.creole.ANNIEConstants;import ga

13、te.creole.SerialAnalyserController;import gate.creole.numbers.AnnotationConstants;import gate.util.GateException; import java.io.BufferedReader;import java.io.File;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.math.BigDecimal;import .Malfo

14、rmedURLException;import java.util.Iterator;import java.util.TreeSet; public class GateMain      private SerialAnalyserController controller;    private String inputASname;     /*     * Main method to execu

15、te simple GATE application     *      * param args Jun 29, 2014     */    public static void main(String args)         GateMain m = new GateMain();     

16、60;  m.initializeGate();        cess();         /*     * This method is used to add the documents to the GATE corpus and get the     * values from the annotatio

17、n based on the annotation name.     *      * Jun 29, 2014     */    private void process()         try             Syste

18、m.out.println("Reading the input files and adding the document to the corpus .");            String input = readInput();            System.out.println("Input."

19、+ input);            controller.getCorpus().add(Factory.newDocument(input);            System.out.println("GATE controller executing.");     

20、0;      controller.execute();            Iterator<gate.Document> documentIterator = controller.getCorpus().iterator();             /iterati

21、ng the document features            while (documentIterator.hasNext()                 gate.Document currDoc = (gate.Document) documentIterator.next(); 

22、0;              gate.Document doc = currDoc;                doc.getFeatures().clear();          

23、0;     AnnotationSet inputAnnSet =                        (inputASname = null | inputASname.length() = 0) ? doc.getAnnotations() : doc.getAnnotations(input

24、ASname);                Iterator<Annotation> numberIterator = inputAnnSet.get(AnnotationConstants.NUMBER_ANNOTATION_NAME).iterator();             

25、60;   /iterating the annotation features                while (numberIterator.hasNext()                    

26、60;Annotation numberAnnotation = (Annotation) numberIterator.next();                    BigDecimal bd = new BigDecimal(String.valueOf(numberAnnotation.getFeatures().get(AnnotationConstants.VALUE_FEATU

27、RE_NAME);                    System.out.println("Number tagger Plugin output: " + bd.longValue();               

28、60;                     catch (Exception e)             e.printStackTrace();          

29、0;       /*     * This method is used to read the input from input directory.     *      * return Jun 29, 2014     * throws IOException     */    

30、;private String readInput() throws IOException         String line = ""        StringBuilder fileContent = new StringBuilder();        StringBuilder input = new StringBui

31、lder();        input.append(System.getProperty("user.dir");        input.append(File.separator).append("input");        input.append(File.separator).append(&quo

32、t;input.txt");        FileReader reader = new FileReader(new File(input.toString();        BufferedReader br = new BufferedReader(reader);         while (line = br.readLi

33、ne() != null)             fileContent.append(line).append("n");                 return fileContent.toString();      

34、;   /*     * This method is used to initialize the GATE.     *      * STEP1: Initialize the GATE home. STEP2: Initialize the GATE controller     * STEP3: Register the processing resources.   

35、;  *      * Jun 29, 2014     */    SuppressWarnings("deprecation")    private void initializeGate()         try          

36、;   System.out.println("Initializing the GATE .");            Gate.setGateHome(new File(System.getenv("GATE_HOME");            Gate.init(); 

37、           String taggerNumber = Gate.getPluginsHome() + File.separator + "Tagger_Numbers"            String japeHome = Gate.getPluginsHome() + File.separator + "JAPE_Plu

38、s"             /registering the processing resources.            System.out.println("Registering the resources .");       

39、60;    Gate.getCreoleRegister().registerDirectories(new File(japeHome).toURL();            Gate.getCreoleRegister().registerDirectories(new File(taggerNumber).toURL();         

40、;   controller = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController");            String processingResources =           &#

41、160;         "gate.creole.tokeniser.DefaultTokeniser", "gate.creole.splitter.SentenceSplitter", "gate.creole.numbers.NumbersTagger" ;             /adding the processing to the GATE controller            for (int pr = 0 ; pr < processingResources.length ; pr+)           &

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论