版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Naturali(奇点机智)联合创始人/CTO国际计算语言学
(ACL
Fellow)前
高级管理科学家搜索问答系统的创始人和技术在自然语言处理及理解领域总共 过90余篇,其研究总计被 超过14000次市第十三批“
”工作类特聘刘家骅2018机器阅读理解竞赛冠军Naturali
奇点机智计算机系关注自然语言处理,语音识别信息学竞赛一等奖,国家集训队员OutlinePart
1Question
Answering
(QA)
and
Reading
Comprehension
(RC)Modular
QA
system:
main
components
and
conceptsPart
2R k
and
datasetsEnd-to-End
RCQuestion
Answering:
MotivationsQA
is
the
hallmark
of
NLUbest
demonstration
of
understandingInvolving
full-spectrum
of
NLP
techniquesOne
of
the
oldest
NLP
applicationsautomated
customer
servicevirtual
assistantsDeployed
QA
SystemsAmazon
AlexaApple
SiriWolfram
AlphaSearch
Engines:,
Baidu,
Bing,
….Types
of
Question
AnswerersSpecialized
ServicesQuestion-to-Question
MapKB-based
Question
Answering:translate
questions
to
knowledge
baseretrieval
queriesIR-based
Question
Answeringfind
answers
in
unstructured
textSpecialized
Answer
Servicesweatherstock/currencysport
scoresmath……Specialized
AnswererQuestion-to-Question:
an
exampleKnowledge
Graph
Example:
CoronaKnowledge
Base
AnswersKnowledge
Base
AnswersKB-based
QA:
Pros
andConshigh
precisiongreat
for
head
queries✗
must
anticipate
all
questions✗
restricted
to
short
answers✗
costly
to
make
data
fresh
and
completeKnowledge
Graph
Entry:
Yanjing
BeerIR-based
Question
AnsweringIR-based
Question
AnsweringqueryQueryyzerqueryrepDdocsSearchEngineanswertype
scoreraggregationscorermatch
scorerAnswer
ExtractorReading
ComprehensionqueryQueryyzerqueryrepDdocsanswertype
scoreraggregationscorermatch
scorerAnswer
ExtractorReading
ComprehensionqueryDdocsanswerEnd-to-EndRCIR-based
QuestionAnswering什么酶可以分解淀粉Queryysis:
Finding
Focus
WordsThe
focus
words
are
words
in
a
query
that
specify
the
type
of
the
answerthat
the
user
is
looking
for.A
query
isanswer-seeking
if
and
only
if
such
focus
words
are
foundQuestion
Answering
with
Unstructured
Text[what
enzyme
breaks
down
starch][what
enzyme
breaks
down
starch]Secreted
in
the
saliva,salivary
amylase
breaks
down
long-chainand
branched
carbohydrates
into
two-
and
three-molecule
sugars
called
maltose.
Once
it
enters
the
duodenum,
the portion
ofthe
small
intestine,
pancreatic
amylase
converts
to
its
active
form.…
Sucrase
divides
sucrose,
more
commonly
known
as
tablesugar,
into
its
glucose
and
fructose
components.
...Queryysis:Candidate
Extraction:Candidate
Scoring:amylase
(0.998)
amylase
enzymes(0.925)
enzyme
maltase(0.881)
salivary
amylase(0.860)
protease
enzymes(0.765)
protease
(0.697)lipase
(0.665)
lipase
enzymes(0.663)
sucrase
(0.653)
mucin(0.633)
carbohydrase(0.525)
disaccharides
(0.474)Focus
Words:
ExamplesExplicit
questionswhat
enzyme
breaks
down
starch?2014
ebola
outbreak
happened
in
what
african
countrieswhat
is
the
melting
point
of
paraffin
waxhow
much
does
a
macbook
air
weighwhat
do
hedgehogs
eatImplicit
questionsfastest
animalannual
rainfall
of
beijingNon-question:when
in
rome
rottenFinding
Focus
Words
in
Explicit
QuestionsWh-word
itself:
when/where/whoThe noun
phrase
after
which/whatwhat
enzyme
breaks
down
starch?2014
ebola
outbreak
happened
in
what
african
countrieswhat
is
the
melting
point
of
paraffin
waxImplicit
QuestionsUse
explicit
questions
to
obtain
training
examples
for
implicit
questions:Pattern:
(what|who)
(is|was|are|were)
the
Attribute
of
Entity[what
is
the
fastest
animal][what
is
the
annual
rainfall
of
beijing]Pattern:
(what|who)
(is|was|are|were)
Entity’s
Attribute[what
is
vietnam's
population]Question
Answering
with
Unstructured
Text[what
enzyme
breaks
down
starch][what
enzyme
breaks
down
starch]Secreted
in
the
saliva,
salivary
amylase
breaks
down
long-chainand
branched
carbohydrates
into
two-
and
three-molecule
sugars
called
maltose.
Once
it
enters
the
duodenum,
the portion
ofthe
small
intestine,
pancreatic
amylase
converts
to
its
active
form.…Sucrase
divides
sucrose,
more
commonlyknownas
tablesugar,
into
its
glucose
and
fructose
components.
...Queryysis:Candidate
Extraction:Candidate
Scoring:amylase
(0.998)
amylase
enzymes(0.925)
enzyme
maltase(0.881)
salivary
amylase(0.860)
protease
enzymes(0.765)
protease
(0.697)lipase
(0.665)
lipase
enzymes(0.663)
sucrase
(0.653)
mucin(0.633)
carbohydrase(0.525)
disaccharides
(0.474)Expected
AnswerTypeExpected
answer
type
of
a
question
specifies
what
phrase/text
canpotentially
be
an
answer
(without
knowing
the
context).The
expected
answer
type
of
a
question
depends
onThe
focus
words
of
the
questionThe
ability
of
recognize
whether
something
belongs
to
that
typeCoarse-Grain
Answer
TypesA
small
number
of
types,
LOCATION,
TIME,
TEMPERATURE,
…Advantage:use
a
named
entity
recognizer
to
determine
what
phrase
matches
the
type.Disadvantage:map from
the
focus
word
in
query
to
an
answer
type
is
non-trialmelting
point
->TEMPERATUREannual
rainfall
->
LENGTHassistant
coach
->Lexical
Answer
TypesUse
the
focus
word
in
the
query
as
the
answer
typeTrivial
map from
focus
words
to
answer
typesSimilar
types
are
modeled
separa
yE.g.,
diameter
and
width
are
treated
as
two
different
types.Extract
Answer
Type
Examples
with
PatternsNumerical
types:(has|with)
(a|an)
TYPE
ofEntity
types:TYPE(s),
such
asAnswer
Type
Example:
annualrainfallThe
Zatecka
Basin,
the
driest
area,
has
anannual
rainfall
of
about
18
inches.More
than
80%
of
the
continent
has
an
annual
rainfall
of
less
than
600
mm;Cuzco
with
a
mea age
temperature
of
10.7°C,
and
highest
average
monthly
12.1,
has
anannual
rainfall
of
804
mm.along
the
western
Ghats
escarpment,
has
an
annual
rainfall
of
about
5600
mm
which
is
towardsthe
mid-range
of
rain
stations
in
theareaFor
example,
the
'region
comprising
New
Mexico,
Arizona,
Colorado,
Utah,
Nevada,
andWyoming
has
an
annual
rainfall
of
from
eight
to
sixteen
inchesDurban
has
an
annual
rainfall
of
1,009millimeters.Answer
Type
Example:
sleepillsThere
is
little
evidence
that
the
newer
generation
slee pills
such
as
Ambienor
Lunesta
are
more
effective
than
older
slee pills
such
as
Dalmane
or
...Diphenhydramine
is
found
in
many
popular
over
the
counter
slee pills
suchas
Tylenol
PM,
Excedrin
PM
and
Nytol.Sedative
slee pills
such
as
Ambien
can
nearly
double
the
risk
for
caraccidents
among
new
users
compared
with
nonusersA
newer
class
of
non-benzodiazepine
slee pills
such
as
eszopiclone
orzolpidem
appears
to
be
safe
to
use
by
patients
with
OSAQuestion
Answering
with
Unstructured
Text[what
enzyme
breaks
down
starch][what
enzyme
breaks
down
starch]Secreted
in
the
saliva,salivary
amylase
breaks
down
long-chainand
branched
carbohydrates
into
two-
and
three-molecule
sugars
called
maltose.
Once
it
enters
the
duodenum,
the portion
ofthe
small
intestine,
pancreatic
amylase
converts
to
its
active
form.…
Sucrase
divides
sucrose,
more
commonly
known
as
tablesugar,
into
its
glucose
and
fructose
components.
...Queryysis:Candidate
Extraction:Candidate
Scoring:amylase
(0.998)
amylase
enzymes(0.925)
enzyme
maltase(0.881)
salivary
amylase(0.860)
protease
enzymes(0.765)
protease
(0.697)lipase
(0.665)
lipase
enzymes(0.663)
sucrase
(0.653)
mucin(0.633)
carbohydrase(0.525)
disaccharides
(0.474)Answer
Candidate
ScoringAnswer
type
scoreMatch
score:proximity
to
query
keywordsslot
matchingAggregation
score任务定义数据集任务演化主流模型端对端阅读理解阅读理解任务定义阅读理解任务可以被定义为一个监督学习的问题(Herzmann
et
al.,
2015)给定一个上下文文档c(篇章p),一个与之相关的问题q,问题的答案a,对条件概率P(a|c,q)进行估计标注数据集由<文档c,问题q,答案a>三元组构成的阅读理解数据集/Daily
Mail数据集文章作为来以
和Daily
Mail源文档将文章的要点句子里变成完形填空形式的问题答案是文档中的一个实体阅读理解数据集SQuAD数据集以Wikipedia页面中的段落作为来源文档根据文档人工编写问题答案是文档里面的 续片段(span)阅读理解数据集MS
MARCO数据集问题来自搜索引擎真实用户query相关文档是信息检索系统从真实网页得到的段落,每个问题对应多个段落标注答案是人工根据文档总结撰写而成阅读理解数据集DuReader数据集问题来自搜索引擎真实用户query,包括事实性问题和意见性问题相关文档是从搜索引擎得到的
靠前结果的网页全文,一个问题对应多篇文档标注答案是人工根据文档总结撰写而成,一个问题可能对应多个答案阅读理解任务演化越来越接近真实场景,越来越封闭领域->开放领域问题:完形填空形式和人工编写问题->用户真实query相关文档:高质量文档->互联网网页内容,一个简单段落->多篇长文档答案:文档里面的一个词,
续片段
->
人工总结撰写的可能包含多个句子的答案,一个答案
->
多个答案端对端模型结构向量特征表示对问题和篇章分别进行编码利用注意力机制融合问题和篇章信息利用指针网络进行答案抽取端对端模型将问题和篇章的词语转化为向量化特征表示词向量(GloVe等)字符向量序列构成的词表示(英文)词性标注(POS
t
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024至2030年中国窄型三角胶带数据监测研究报告
- 2024至2030年饲用着色剂项目投资价值分析报告
- 2024至2030年钢筋切断机刀项目投资价值分析报告
- 健身房清洁保洁服务合同
- 写字楼租赁会计师事务所租赁合同
- 企业官网建设协议
- 2024年餐具网架项目可行性研究报告
- 2024至2030年中国板式换热器机组数据监测研究报告
- 北京联合大学《数据结构A》2023-2024学年期末试卷
- 2024年电动升降器项目可行性研究报告
- 热分析(DSC)汇总课件
- 博物馆管理制度讲解员管理制度版
- 应急预案评估报告
- 非煤矿山培训课件
- 【教学课件】积极维护人身权利-示范课件
- 等级保护课件
- 酒精依赖课件
- 数学绘本《乱七八糟的魔女之城》课件
- 医院智能化弱电设计方案
- “双减”背景下家校社协同育人的内涵、机制与实践路径
- 汽车理论试题卷及答案
评论
0/150
提交评论