GAE功能介绍对外_第1页
GAE功能介绍对外_第2页
GAE功能介绍对外_第3页
GAE功能介绍对外_第4页
GAE功能介绍对外_第5页
已阅读5页,还剩22页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

GATE功能介绍鲁廷明 2009年6月9日2目录概览功能介绍本次研究的不足之处概览(1)GATE

is

a

General

Architecture

for

Text

EngineeringDeveloped

by

the

Natural

Language

Processing

Research

Groupwithin

the

Department

of

Computer

Science

at

the

UniversityofSheffield概览(2)Language

Resources

(LRs)refers

to

data-only

resources

such

as

document,

corpus.Processing

Resources

(PRs)refers

to

resources

whose

character

is

principally

programmatic

oralgorithmic,

such

as

tokeniser,

POS

tagger.Applicationsmodel

a

control

strategy

for

the

execution

ofPRs.There

are

two

main

types

of

pipeline:Simple

pipelinesCorpus

pipelines概览(4)documentannottationSetannotationTypefeature功能介绍Tokeniser分功能,每个Token注包括的属性有:kind:

Word,

Number,

Symbol,

Punctuation,

SpaceTokenorth:

upperInitial,

allCaps,

lowerCase,

mixedCaps

lengthstringSentence

Spliter实现分句功能功能介绍Gazetteer辞典lists.def

内容包括country.lst:location:countrycountry.lst

内容包括ChinaChineChypreColombiaColombie功能介绍Part

of

Speech

Tagger词性标注也有标注错误的:I

will

study

hard

this

year.JJ(adjective,应当为RB

adverb)功能介绍Semantic

Tagger就是NE

Transducer,命名实体识别Orthographic

Coreference

(Orthomatcher)The

Orthomatcher

module

adds

identity

relations

between

namedentities

found

by

the

semantic

tagger,

in

order

to

performcoreference.Pronominal

Coreference将人名、代系起来,比如:John

Smith…he…him…John…he…功能介绍Document

ResetRemove

all

the

annotation

sets

and

their

contents,

apart

fromtheone

containing

the

document

format

analysis

(Original

Markups).功能介绍Verb

Group

ChunkerThe

rules

cover

finite

('is

investigating'),

non-finite

('toinvestigate'),

participles

('investigated'),

and

special

verbconstructs

('is

going

to

investigate').Noun

Phrase

ChunkerMarking

noun

phrases

in

text.功能介绍OntoText

Gazetteer与ANNIE

Gazetteer

结果相似,但是算法不同。Flexible

GazetteerThe

Flexible

Gazetteer

provides

users

with

the

exibility

to

choosetheir

own

customized

input

and

an

external

Gazetteer.Gazetteer

List

Collector指定标注类型的实体插入到指定Gazetteer的相应list中并生成统计文件(实体名$次数)功能介绍Tree

TaggerThe

TreeTagger

is

a

language-independent

part-of-speechtagger.The

TreeTagger

is

a

tool

for

annotating

text

with

part-of-speech

andlemma

information.

It

was

developed

by

Helmut

Schmid

in

the

TC

projectat

the

Institute

for

Computational

Linguistics

of

the

University

ofStuttgart.

The

TreeTagger

has

been

successfully

used

to

tag

German,English,

French,

Italian,

Dutch,

Spanish,

Bulgarian,

Russian,

Greek,Portuguese,

Chinese

and

old

French

texts

and

is

adaptable

to

otherlanguages

if

a

lexicon

and

a

manually

tagged

training

corpusareavailable.分析英语文件成功cd\treetagger\bintag-english.bat

news1.txt但是未能集成到GATE中功能介绍StemmerEach

Token

is

annotated

with

a

new

feature

"stem",

with

thestem

for

that

word

as

its

value.GATE

Morphological

AnalyzerConsidering

one

token

and

its

part

of

speech

tag,

one

at

a

time,

itidentifes

its

lemma

and

an

affix.

These

values

are

than

added

asfeatures

on

the

Token

annotation.MiniPar

ParserIt

takes

one

sentence

as

an

input

and

determines

the

dependencyrelationships

between

the

words

of

a

sentence.功能介绍RASP

ParserRASP

(Robust

Accurate

Statistical

Parsing)

is

a

robust

parsing

system

for

English.包括以下四个PR:RASP2

TokenizerRASP2

POS

TaggerRASP2

Morphological

AnalyserRASP2

Parser:

creates

multiple

dependency

annotations

to

represent

a

parse

of

each

sentence.RASP

is

only

supported

for

Linux

operating

systems.SUPPLE

ParserSUPPLE

is

a

bottom-up

parser

that

constructs

syntax

trees

and

logical

forms

for

Englishsentences.Need

a

Prolog

interpreter.Stanford

Parser功能介绍Montre

alTra

nsduce

rManyofthekeyfeaturesintroducedintheMontrealTransducer(MT)havenowbeenportedinsomeformintothestandardJAPEtransducer.ThestandardJAPEtransducerislikelytobemorestableandbugswillbexedmorerapidlythanwiththeMT.与standardJAPEtransducer类似,未研究。功能介绍ChinesePl

uginTheChineseplugincontainsasimpleapplicationforChineseNErecognition(chinese.gapp).功能介绍Chem

is

tr

yTaggerThisGATEmoduleisdesignedtotaganumberofchemistryitemsinrunningtext.Currentlythetaggertagscompoundformulas(e.g.SO2,H2O,H2SO4...)ions(e.g.Fe3+,Cl-)andelementnamesandsymbols(e.g.SodiumandNa).Limitedsupportforcompoundnamesisalsoprovided(e.g.sulphurdioxide)butonlywhenfollowedbyacompoundformula(inparenthesisorcommas).功能介绍FlexibleE

xporter可以指定一个标注集的若干标注类型,输出带这些标注的文档到文件,并可以改变输出文件中标注类型的名称。An

no

ta

tion

SetTransfer将一种标注集中的一部分标注转移(或拷贝)到另一个标注集中(然后将这个部分的标注集可以作为其他PRs的输入,再处理)。Forexample,wemightwishtoperformnamedentityrecognitiononthebodyofanHTMLtext,butnotontheheaders.Aftertokenisingandperforminggazetteerlookuponthewholetext,wewouldusetheAnnotationSetTransfertotransferthoseannotations(createdbythetokeniserandgazetteer)intoanew决道an肌not饰at纺i困ons闲e炎t,鉴猜and跟t肚h腔enr坦要unt片保her收e艳ma队in率i爪ngN获E马r括es城ou商rc脖e暂s,s要血uch稼a海st港映hes孟e司m慢ant披ic腿t年a仰gge卧ra铺透ndcor秆ef抗er肤e恒nce驻mo我dul迎es食,o楚n薪珠功能介绍Inform

at

ion

Re

trieva

linG

AT

EThecurrentimplementationisbasedonthemostpopularopensourcefull-textsearchengine–Lucene.CrawlerThecrawlerpluginenablesGATEtobeusedforacorpusthatisbuiltusingawebcrawl.ThecrawleritselfisWebsphinx.ThisisaJAVAbasedmulti-threadedwebcrawlerthatcanbecustomizedforanyapplication.功能介绍GooglePluginThispluginallowstheusertoqueryGoogleandbuildthedocumentcorpusthatcontainsthesearchresultsreturnedbyGoogleforthequery.YahooPluginThispluginallowstheusertoqueryYahooandbuildthedocumentcorpusthatcontainsthesearchresultsreturnedbyYahooforthequery.功能介绍WordNetinGATE执行出错。MachineLearningi

nGATEMinorThirdMI

AKT

NLGLexico

n尚不明白用在何处。Ont野o撤Ro元otGa缩慧zet鼠t汁eer亩功能介绍Kea-Au

to

ma

ticKeyphraseDete

ctio

nKeaisatoolforautomaticdetectionofkeyphrases.先训练得到模型,然后可以应用。OntotextJapeCCompilerJapecisanalternativeimplementationoftheJAPElanguagewhichworksbycompilingJAPEgrammarsintoJavacode.Comparedtothestandardimplementation,thesecompiledgrammarscanbeseveraltimesfastertorun.功能介绍ANNICANNIC(ANNotation

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论