奇点机智cto、aclfellow让机器读懂人类探索问答系统和阅读理解_第1页
奇点机智cto、aclfellow让机器读懂人类探索问答系统和阅读理解_第2页
奇点机智cto、aclfellow让机器读懂人类探索问答系统和阅读理解_第3页
奇点机智cto、aclfellow让机器读懂人类探索问答系统和阅读理解_第4页
奇点机智cto、aclfellow让机器读懂人类探索问答系统和阅读理解_第5页
已阅读5页,还剩46页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Naturali(奇点机智)联合创始人/CTO国际计算语言学

(ACL

Fellow)前

高级管理科学家搜索问答系统的创始人和技术在自然语言处理及理解领域总共 过90余篇,其研究总计被 超过14000次市第十三批“

”工作类特聘刘家骅2018机器阅读理解竞赛冠军Naturali

奇点机智计算机系关注自然语言处理,语音识别信息学竞赛一等奖,国家集训队员OutlinePart

1Question

Answering

(QA)

and

Reading

Comprehension

(RC)Modular

QA

system:

main

components

and

conceptsPart

2R k

and

datasetsEnd-to-End

RCQuestion

Answering:

MotivationsQA

is

the

hallmark

of

NLUbest

demonstration

of

understandingInvolving

full-spectrum

of

NLP

techniquesOne

of

the

oldest

NLP

applicationsautomated

customer

servicevirtual

assistantsDeployed

QA

SystemsAmazon

AlexaApple

SiriWolfram

AlphaSearch

Engines:,

Baidu,

Bing,

….Types

of

Question

AnswerersSpecialized

ServicesQuestion-to-Question

MapKB-based

Question

Answering:translate

questions

to

knowledge

baseretrieval

queriesIR-based

Question

Answeringfind

answers

in

unstructured

textSpecialized

Answer

Servicesweatherstock/currencysport

scoresmath……Specialized

AnswererQuestion-to-Question:

an

exampleKnowledge

Graph

Example:

CoronaKnowledge

Base

AnswersKnowledge

Base

AnswersKB-based

QA:

Pros

andConshigh

precisiongreat

for

head

queries✗

must

anticipate

all

questions✗

restricted

to

short

answers✗

costly

to

make

data

fresh

and

completeKnowledge

Graph

Entry:

Yanjing

BeerIR-based

Question

AnsweringIR-based

Question

AnsweringqueryQueryyzerqueryrepDdocsSearchEngineanswertype

scoreraggregationscorermatch

scorerAnswer

ExtractorReading

ComprehensionqueryQueryyzerqueryrepDdocsanswertype

scoreraggregationscorermatch

scorerAnswer

ExtractorReading

ComprehensionqueryDdocsanswerEnd-to-EndRCIR-based

QuestionAnswering什么酶可以分解淀粉Queryysis:

Finding

Focus

WordsThe

focus

words

are

words

in

a

query

that

specify

the

type

of

the

answerthat

the

user

is

looking

for.A

query

isanswer-seeking

if

and

only

if

such

focus

words

are

foundQuestion

Answering

with

Unstructured

Text[what

enzyme

breaks

down

starch][what

enzyme

breaks

down

starch]Secreted

in

the

saliva,salivary

amylase

breaks

down

long-chainand

branched

carbohydrates

into

two-

and

three-molecule

sugars

called

maltose.

Once

it

enters

the

duodenum,

the portion

ofthe

small

intestine,

pancreatic

amylase

converts

to

its

active

form.…

Sucrase

divides

sucrose,

more

commonly

known

as

tablesugar,

into

its

glucose

and

fructose

components.

...Queryysis:Candidate

Extraction:Candidate

Scoring:amylase

(0.998)

amylase

enzymes(0.925)

enzyme

maltase(0.881)

salivary

amylase(0.860)

protease

enzymes(0.765)

protease

(0.697)lipase

(0.665)

lipase

enzymes(0.663)

sucrase

(0.653)

mucin(0.633)

carbohydrase(0.525)

disaccharides

(0.474)Focus

Words:

ExamplesExplicit

questionswhat

enzyme

breaks

down

starch?2014

ebola

outbreak

happened

in

what

african

countrieswhat

is

the

melting

point

of

paraffin

waxhow

much

does

a

macbook

air

weighwhat

do

hedgehogs

eatImplicit

questionsfastest

animalannual

rainfall

of

beijingNon-question:when

in

rome

rottenFinding

Focus

Words

in

Explicit

QuestionsWh-word

itself:

when/where/whoThe noun

phrase

after

which/whatwhat

enzyme

breaks

down

starch?2014

ebola

outbreak

happened

in

what

african

countrieswhat

is

the

melting

point

of

paraffin

waxImplicit

QuestionsUse

explicit

questions

to

obtain

training

examples

for

implicit

questions:Pattern:

(what|who)

(is|was|are|were)

the

Attribute

of

Entity[what

is

the

fastest

animal][what

is

the

annual

rainfall

of

beijing]Pattern:

(what|who)

(is|was|are|were)

Entity’s

Attribute[what

is

vietnam's

population]Question

Answering

with

Unstructured

Text[what

enzyme

breaks

down

starch][what

enzyme

breaks

down

starch]Secreted

in

the

saliva,

salivary

amylase

breaks

down

long-chainand

branched

carbohydrates

into

two-

and

three-molecule

sugars

called

maltose.

Once

it

enters

the

duodenum,

the portion

ofthe

small

intestine,

pancreatic

amylase

converts

to

its

active

form.…Sucrase

divides

sucrose,

more

commonlyknownas

tablesugar,

into

its

glucose

and

fructose

components.

...Queryysis:Candidate

Extraction:Candidate

Scoring:amylase

(0.998)

amylase

enzymes(0.925)

enzyme

maltase(0.881)

salivary

amylase(0.860)

protease

enzymes(0.765)

protease

(0.697)lipase

(0.665)

lipase

enzymes(0.663)

sucrase

(0.653)

mucin(0.633)

carbohydrase(0.525)

disaccharides

(0.474)Expected

AnswerTypeExpected

answer

type

of

a

question

specifies

what

phrase/text

canpotentially

be

an

answer

(without

knowing

the

context).The

expected

answer

type

of

a

question

depends

onThe

focus

words

of

the

questionThe

ability

of

recognize

whether

something

belongs

to

that

typeCoarse-Grain

Answer

TypesA

small

number

of

types,

LOCATION,

TIME,

TEMPERATURE,

…Advantage:use

a

named

entity

recognizer

to

determine

what

phrase

matches

the

type.Disadvantage:map from

the

focus

word

in

query

to

an

answer

type

is

non-trialmelting

point

->TEMPERATUREannual

rainfall

->

LENGTHassistant

coach

->Lexical

Answer

TypesUse

the

focus

word

in

the

query

as

the

answer

typeTrivial

map from

focus

words

to

answer

typesSimilar

types

are

modeled

separa

yE.g.,

diameter

and

width

are

treated

as

two

different

types.Extract

Answer

Type

Examples

with

PatternsNumerical

types:(has|with)

(a|an)

TYPE

ofEntity

types:TYPE(s),

such

asAnswer

Type

Example:

annualrainfallThe

Zatecka

Basin,

the

driest

area,

has

anannual

rainfall

of

about

18

inches.More

than

80%

of

the

continent

has

an

annual

rainfall

of

less

than

600

mm;Cuzco

with

a

mea age

temperature

of

10.7°C,

and

highest

average

monthly

12.1,

has

anannual

rainfall

of

804

mm.along

the

western

Ghats

escarpment,

has

an

annual

rainfall

of

about

5600

mm

which

is

towardsthe

mid-range

of

rain

stations

in

theareaFor

example,

the

'region

comprising

New

Mexico,

Arizona,

Colorado,

Utah,

Nevada,

andWyoming

has

an

annual

rainfall

of

from

eight

to

sixteen

inchesDurban

has

an

annual

rainfall

of

1,009millimeters.Answer

Type

Example:

sleepillsThere

is

little

evidence

that

the

newer

generation

slee pills

such

as

Ambienor

Lunesta

are

more

effective

than

older

slee pills

such

as

Dalmane

or

...Diphenhydramine

is

found

in

many

popular

over

the

counter

slee pills

suchas

Tylenol

PM,

Excedrin

PM

and

Nytol.Sedative

slee pills

such

as

Ambien

can

nearly

double

the

risk

for

caraccidents

among

new

users

compared

with

nonusersA

newer

class

of

non-benzodiazepine

slee pills

such

as

eszopiclone

orzolpidem

appears

to

be

safe

to

use

by

patients

with

OSAQuestion

Answering

with

Unstructured

Text[what

enzyme

breaks

down

starch][what

enzyme

breaks

down

starch]Secreted

in

the

saliva,salivary

amylase

breaks

down

long-chainand

branched

carbohydrates

into

two-

and

three-molecule

sugars

called

maltose.

Once

it

enters

the

duodenum,

the portion

ofthe

small

intestine,

pancreatic

amylase

converts

to

its

active

form.…

Sucrase

divides

sucrose,

more

commonly

known

as

tablesugar,

into

its

glucose

and

fructose

components.

...Queryysis:Candidate

Extraction:Candidate

Scoring:amylase

(0.998)

amylase

enzymes(0.925)

enzyme

maltase(0.881)

salivary

amylase(0.860)

protease

enzymes(0.765)

protease

(0.697)lipase

(0.665)

lipase

enzymes(0.663)

sucrase

(0.653)

mucin(0.633)

carbohydrase(0.525)

disaccharides

(0.474)Answer

Candidate

ScoringAnswer

type

scoreMatch

score:proximity

to

query

keywordsslot

matchingAggregation

score任务定义数据集任务演化主流模型端对端阅读理解阅读理解任务定义阅读理解任务可以被定义为一个监督学习的问题(Herzmann

et

al.,

2015)给定一个上下文文档c(篇章p),一个与之相关的问题q,问题的答案a,对条件概率P(a|c,q)进行估计标注数据集由<文档c,问题q,答案a>三元组构成的阅读理解数据集/Daily

Mail数据集文章作为来以

和Daily

Mail源文档将文章的要点句子里变成完形填空形式的问题答案是文档中的一个实体阅读理解数据集SQuAD数据集以Wikipedia页面中的段落作为来源文档根据文档人工编写问题答案是文档里面的 续片段(span)阅读理解数据集MS

MARCO数据集问题来自搜索引擎真实用户query相关文档是信息检索系统从真实网页得到的段落,每个问题对应多个段落标注答案是人工根据文档总结撰写而成阅读理解数据集DuReader数据集问题来自搜索引擎真实用户query,包括事实性问题和意见性问题相关文档是从搜索引擎得到的

靠前结果的网页全文,一个问题对应多篇文档标注答案是人工根据文档总结撰写而成,一个问题可能对应多个答案阅读理解任务演化越来越接近真实场景,越来越封闭领域->开放领域问题:完形填空形式和人工编写问题->用户真实query相关文档:高质量文档->互联网网页内容,一个简单段落->多篇长文档答案:文档里面的一个词,

续片段

->

人工总结撰写的可能包含多个句子的答案,一个答案

->

多个答案端对端模型结构向量特征表示对问题和篇章分别进行编码利用注意力机制融合问题和篇章信息利用指针网络进行答案抽取端对端模型将问题和篇章的词语转化为向量化特征表示词向量(GloVe等)字符向量序列构成的词表示(英文)词性标注(POS

t

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论