research topics sharing-knowledge graph 0123研究课题共享知识图谱_第1页
research topics sharing-knowledge graph 0123研究课题共享知识图谱_第2页
research topics sharing-knowledge graph 0123研究课题共享知识图谱_第3页
research topics sharing-knowledge graph 0123研究课题共享知识图谱_第4页
research topics sharing-knowledge graph 0123研究课题共享知识图谱_第5页
已阅读5页,还剩154页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1.知识图1.知识图谱的典型应1.11.2自动问1.32.构建知识图谱的关键技2.1••–实体消歧和实体2.3关系抽2.4知识表2.5•1.•1.知识图谱的典型应•1•11查询理解(Query•是典型的短文本(short•是典型的短文本(shorttext),一个查询词往往仅由几个•–––拼写纠正query广告点击率预估CTRforads•1.•1.2自动问答(Question••••––––––相似问题检索Question问题分类Question答案质量预测AnswerQuality答案摘要生成社区内问题路专家推荐Expert•挑战•挑战•–––利用querylog•–––利用queryloganchortext•–––––•基于分布式表•基于分布式表示(DeepLearning)–利用深度学习对传统方法进行–基于分布式表示的直接toEnd利用CNN进行关系分利用CNN进行关系分类onDeepConvolutionalNeuralNetworkCOLINGpp需要复杂NLPp人工设计的特征(egentitytypePOS,parsep将sentencelevelfeature直接送入CNN CNN CNN进行实体关系(DistantSupervisionforRelationExtractionviaPiecewiseConvolutionalNeuralNetworksEMNLP15p之前版本的SingleMaxPoolingp利用bothinternalandexternalcontextssentencelevel hiddenlayersize减少的太快导致特征toocoarsep根据实体位置的不同引入更多“冗余”信p提出Piece-wiseMaxPooling,利用bothinternalandexternalcontextssentencelevelfine-grainedfeature。PiecewisePiecewiseConvolutionalNeuralPiecewiseConvolutionalNeuralp利用pPiecewiseConvolutionalNeuralp利用pMulti-–CNN-basedrelationclassificationmodelcanberegardedas–SupposethatthereareTbags}–Thei-thhas–Theobjectionfunctionis生成Query生成QueryGraph进行知识库QA(SemanticParsingviaStagedQueryGraphGeneration:QuestionAnsweringwithKnowledgeBase,Yih,ACL2015state-of-the-pQueryMegxtopicyFamilyQueryMegxtopicyFamilyStagedStagedQueryGraphStagedQueryGraph•AsearchproblemwithStagedQueryGraph•AsearchproblemwithstagedstatesandWhofirstvoicedMegonFamily(1)LinkFamilyϕMegFamilyLinkTopicϕs2MegFamilyLinkTopicϕs2Meg•AnadvancedentitylinkingsystemforshortYang&Chang,“S-MART:NovelTree-basedStructuredLearningAlgorEntityLinking.”InACL-15.••Preparesurface-formlexiconℒforentitiesintheEntitymentioncandidates:allconsecutivewordsequebythestatisticalmodelUpto10top-rankedentitiesareconsideredastopic•WhofirstvoicedMegonFamily(2)IdentifyCoreInferentialFamilyyxyxFamilyWhofirstvoicedMegonFamily(2)IdentifyCoreInferentialFamilyyxyxFamilyFamilyxFamilyIdentifyCoreInferential•RelationshipbetweenyxFamilyandanswer)yxFamilyFamily•ExploreIdentifyCoreInferential•RelationshipbetweenyxFamilyandanswer)yxFamilyFamily•Exploretwotypesofs5xFamily––Length1tonon-CVTLength2canbegroundedtoWhofirstvoicedMegonFamilygWhofirstvoicedMegonWhofirstvoicedMegonFamily(3)AugmentUsingrulestoaddconstraintsonthecoreinferentialchainIfxisaentity,itcanbeaddedasentitynodeIfxissuchkeywords,like“first”,“latest”,itcouldbeaddedasaggregationconstraints.AnewAnewframeworkforsemanticparsingof•QueryMeaningrepresentationthatcanbedirectlymappedtologicalformtargetKBSemanticQuerygraphgenerationasstagedsearch••Newstate-of-the-artonWebQuestionsAdvancedentityConvolutionalNNforrelation)•Future–Improvethecurrent•Future–Improvethecurrent•••MatchingrelationsmoreHandlingconstraintsinamoreprincipledJointstructured-outputpredictionmodel(e.g.,SEARN[DauméIII06])–Extendthequerygraphtorepresentmorecomplicated•Data&–––Sent2Vec(DSSM)Systemoutputhttp://aka.ms/codalab-Intermediatefiles(e.g.,entitylinking,modelfiles,trainingdata,etc.)willbereleasedsoonhttp://aka.ms/stagg•基于•基于分布式表示的直接EndtoEnd模•1.3文档表示•1.3文档表示(Document–经典的文档表示方案是空间向量模型(VectorSpaceModel),该–•文档表示(Document–•文档表示(Document–中的实体及其复杂语义关系来基于图的表示(Schuhmacher,etal.这种知识图谱的子图比词汇向量拥有更丰富的表示空间,也为档分类、文档摘要和关键词抽取将让现在的技术从基于字符串匹配的层次提升至知识理解––•2.构建知识图•2.构建知识图谱的关键技2.1–实体消歧和实体2.3–知识表示和推2.5•实•实体识时间、日期、货币和百分比)命名实体因此也有研究者提出对这些概念进行识pp时间、日期、货币和百分比的构成有比较明显的规律,识pp时间、日期、货币和百分比的构成有比较明显的规律,识p人名、地名、机构名的用字灵活,识别的难度很大,pp不同语境下,可能具有不同的实体类型;或者在某些条件下pp苹果、彩霞、河南、新世命名实体识别的方法(WuEMNLP命名实体识别的方法(WuEMNLPpp无论何种方法,都在试图充分发现和利用实体所在的上下p考虑到每一类命名实体都具有不同的特征,不同类别的实p人名:用基于字的模型描述其内部构成pp不同类型的外国人名用字存在较大差别,如果按照人名的用字和构ppMEMM、HMM、p国际会议:p国际会议:MUC、SigHAN、CoNLL、IEER和pMUC-6和MUC-7设立的命名实体识别专项评测大大推动了英语命名pMUC-6和MUC-7还设立了多语言实体识别评测任务MET,对日语、pSigHAN从2003年开始举办第一届中文分词评测BAKEOFF,2006年p2003年和2004年举办的863计划“中文信息处理与智能人机接口技术p英文:LanguageTechnologyGroupSummaryp英文:LanguageTechnologyGroupSummary开发的英语命名率分别达到95%和92%(p许多英语命名实体系统已经具备了相当程度的大规模文本处p参加MET-2评测的汉语命名实体识别系统对人名、地名、机92%)、(89%,91%)和(89%,88%)(吴友政,2006)(Levow,2006PRFORG-LOC-PER-GPE-~~~(Levow,2006PRFORG-LOC-PER-GPE-~~~~1.3M/63K(词/词次100K/13K(词/词次632K(词61K(词1.6M/76K(词/词次220K/23K(词/词次约约PRFPRFPRFSXU简约约PRFPRFPRFSXU简开 CITYU(繁开pp在BAKEOFF-3MSRC语料和BAKEOFF-3CITYU语料p其中一个很重要原因是:BAKEOFF-3MSRC和CITYU评测提供了相当规模的训练集,而BAKEOFF-3LDC只提供了小p因为训练集和测试集在题材和体裁方面比较类似,可能使得各个系统在BAKEOFF-3MSRC语料和BAKEOFF-3CITYU语p在真实的应用环境中,命名实体识别的性能会大打折扣CRF++StanfordNER哈工大LTPpppp网页信息:不规范、存在很多噪音,有些根本就不构成自然语言句p摩托罗拉V8088折叠手机、第6届苏迪曼杯羽毛球混合团体赛、胆pp需要开p实体类型更多、更细,而且有些实体类别是未知、或者是随时间演•2•22实体消歧和实体链接(Entity实体链指并不局限于文本与实体之间,如下ppp同一指称项具有近似的上下p利用聚类算法进行消pp词袋模型(Baggaetal.,COLING,p语义特征(PedersonetalCLITPp社会化网络(Bekkermanetal.,WWW,p维基百科的知识(HanandZhao,CIKM,p多源异构语义知识融合(HanandZhao,ACL,基于聚类的实体消歧:词袋模型(BaggaetCOLINGpp利用向量空间模型来计算两个实体指称项的相似度,进行MJ1:Michael基于聚类的实体消歧:词袋模型(BaggaetCOLINGpp利用向量空间模型来计算两个实体指称项的相似度,进行MJ1:MichaelJordanisaresearcherinmachilearnimachilearniMJ1:MichaelJordanplaysbasketballinChicagoBullplbasketbalChiBull基于聚类的实体消歧基于聚类的实体消歧(Pedersonetal.CLITPpp利用SVD(Bekkerman(Bekkermanetal.WWWpMJ(BasketBall):Pippen,Buckley,Ewing,pMJ(MachineLearning):Liang,Mackey,ppMJ,Pippen,Buckley,Ewing,Kobe基于聚类的实体消歧Wikipedia(HanCIKM2009)(1/3)pppD.MilneandIanH.Witten2008:The基于聚类的实体消歧Wikipedia(HanCIKM2009)(1/3)pppD.MilneandIanH.Witten2008:ThehighersemanticrelatedWikipediaconceptswillsharemoresemanticrelatedTheconceptshaslinkswithaandTheWholeWikipediaTheconceptshaslinkswithaand基于聚类的实体消歧Wikipedia(HanCIKM2009)(2/3)pMJ1:MichaelJordanisaResearcherinmachine基于聚类的实体消歧Wikipedia(HanCIKM2009)(2/3)pMJ1:MichaelJordanisaResearcherinmachineMachineMJ2:ResearchinGraphicalModels:MichaelGraphicp利用维基条目之间的相关度计算指称项之间的相似度(解决MachineGraphic(HanCIKM(HanCIKM2009使用WePSpp使用结构化关联语义核的实体相似度能够提升10.7%的消基于聚类的实体消歧多源异构知(HanACL2010)(1/3)基于聚类的实体消歧多源异构知(HanACL2010)(1/3)p仅仅考虑Wikipediapp挖掘和集成多源异构知识可以提高实体消歧的性pppWeb网页pACL2010)ACL2010)p等同概p概念连 p p语义图的结构(结构化语义知识)——建模了概念之间的隐藏语义关ppp计算原则:“如果一个概念的邻居概念与另一个概念存在语义关联,则p(HanACL2010)pp使用WePS数据集p使用多源知识能够有效提高消歧的准确基于聚类的实体消歧:评测p基于聚类的实体消歧:评测pWePS:WebPeopleSearchpWePS1是SEMEVAL2007的子任p任务:Web环境中的人名消歧,即给定一个包含某个歧义人名的p评测方基于聚类的实体消歧:评测WePS基于聚类的实体消歧:评测WePSWePSpppp已有工作大多是通过扩展特征,增加更多的知识来提高消歧ppppp给定实体指称项和它所在的文本,将其链接到给定知识库中的相应ppppppppp候选实体MichaelJordan(basketballplayer)MichaelJordan(mycologist)MichaelJordan(footballer)MichaelB.JordanMichaelH.JordanMichaelJordan(Irish…实体指称项文本:MichaelJordanisaformerNBAplayer,activebusinessmanandmajorityowneroftheCharlotteBobcats.pppMichalppMichalJordanisaformerNBApp(ZhangIJCAIp缩(ZhangIJCAIp缩略语在实体指称项中十分常见,据统计,在KBP2009的测试数据p在3904个实体指称项中有827个为缩略p动缩略语指称项具有很强的歧义性,但它的全称往往是没有ABC和AmericanBroadcastingCompanyAI和Artificial在实体指称项文本中,缩略语的全称pppp解决方利用人工规则抽取实体ppppp基本方法:计算实体指称项和候选实体的相似度,选择相似度最大ppBOW模型(HonnibalTAC2009,BikelTACp加入候选实体的类别特征(Bunescuetal.,EACLp加入候选实体的流行度等特征(HanetalACLpEMNLPp利用实体之间类别的共现特征p利用实体之间链接关系(Kulkarnietal.,KDDp利用同一篇文档中不同实体之间存在的语义关联特征(Hanetal.,SIGIR2011)TAC2009,BikelTAC2009)ppTAC2009,BikelTAC2009)pp将实体指称项上下文文本与候选实体上下文文本表示成词袋子向量(BunescuEACLp候选实体的文本(BunescuEACLp候选实体的文本内容可能太短,会导致相似度计算的不准p似度外,还考虑当前文本中的词语与Music,Art等类别的共现信息JohnWilliams(composer):Category={Music,Art…}JohnWilliams(wrestler):Category={Sport,…}ppJohnWilliams(VC):p训练SVM分类器对候选实体进行选p训练数据由Wikipedia中的超级链接获Williamshasalsopp文本相似p指称项文本中词与候选实体numerousclassicalconcerti,andheservedastheprincipalconductortheBostonPopsOrchestrafrom1980to1993DuringhisDuringhisstandoutcareerJordanalsoactsinthemovieSpaceJamp同一p利用Pairwisep同一p利用Pairwise优化策pp利用实体类别重合度计算目标实体语义相似度(Cucerzan,EMNLPpp利用实体之间链接关系计算目标实体语义相似度(Kulkarni,KDD2009基于深度学习的方法(HeACL基于深度学习的方法(HeACLp传统的方法中,计算待消歧实体上下文和目标实体语义相似度的方概念间的内在p在协同过滤的方法中,计算待消歧实体上下文和目标实体语义相似pp社交数据中的实体链接(Shen社交数据中的实体链接(Shenp社交媒体(Twitter)是一种重要的信p社交媒体的上下文较短,语言表述不规ppp利用tweet的用户信息和tweet的交互pTAC-KBPpTAC-KBP(2009-Now):Entityp任务:将文本中的目标实体链接到Wikipedia中的真实概念,达到p评测方法2013评测结果(Micropp目前实体链接方法主要是如何更有效挖掘实体指称项信息,pp难点:•2.•2.3关系抽取(Relation关系抽取是知识图谱构建的核心技术,它决定了知识pp福大学研究者提出远程监督(DistantSupervision)思想,使用由于远程监督只能机械地匹配出现实体对的句子,因此会p目前主要采用统计机器学习的方法,将关系实例转换成p目前主要采用统计机器学习的方法,将关系实例转换成p基于特征向量方法:最大熵模型(Kambhatla2004)和支持向量机(Zhaoetal2005;Zhouetal2005Jiangetal2007)p基于核函数的方法:浅层树核(Zelenkoetal.,2003)、依存树核(Culottaetal2004)、最短依存树核(Bunescuetal2005)、卷积树核(Zhangetal.,2006;Zhouetal.,2007)p基于神经网络的方法:递归神经网络Socheretal.,2012)、基于矩阵空间的递归神经网络(Socheretal.,2012)、卷积神经网络(Zengetalpp基于特征向量方法p主要问题:如何获取各种有效的词法、句法、语义等特征,并把p特征选取:从自由文本及其句法结构中抽取出各种表面特征以及p实体词汇及其上下文特征p实体类型及其组合特征p实体参照方式p交叠p基本短语块特p句法树p基于核函数方法p基于核函数方法p主要问题:如何有效挖掘反映语义关系的结构化信息及如何有效p卷积树核:用两个句法树之间的公共子树的数目来衡量它们之间p标准的卷积树核在计算两棵子树的相似度时,只考虑子树本身,不考虑子树的上下文pp上下文相关卷积树核函数(CS-p在计算子树相似度量,同时考虑子树的祖先信息,如子树根结点的父p基于神经网络的方法p基于神经网络的方法p主要问题:如何设计合理的网络结构,从而捕捉更多的信息,进p递归神经网络p网络的构建过程更多的考虑到句子的句法结构,但是需要依赖复杂的句p卷积神经网络p通过卷积操作完成句子级信息的捕获,不需要复杂的NLP基于卷积网络的关系抽取(ZengetColing2014BestPaperp基于卷积网络的关系抽取(ZengetColing2014BestPaperpS:2013年4月20日8时02分四川省雅安市[芦山县]e1发生了7.0级[地震✔✖分类汶川地震震中在汶川S:2013年4月20日8时02分四川省雅安市[芦山县]e1发生了7.0级[地震基于卷积网络的关系抽取(ZengetColing2014BestPaper基于卷积网络的关系抽取(ZengetColing2014BestPaperp2013年4月20日8时02分四川省雅安市[芦山县]e1发生了7.0级[地震传统特征基于神经网络的特征抽取芦山县m11地震b1发生b2在EntityType:Nounm1,Locationm2ParseTree:Location-VP-PP-NounKernelFeature:问题1:对于缺少NLP问题2:NLP处理工具引入的“错误累积问题3pp通过CNNppp通过CNNpEmbeddings挖掘词汇的语义LexicalLevelFeatures:实体本身的语义特SentenceLevelFeatures:通过CNN网络挖掘句子级别的文本特征,WP+FPSemEval-2010Task8英文数据特征集 POS,Prefixes,Levinclassedmorphological,WordNet,FrameNet,dependencyparse,NomLex-Plus,SemEval-2010Task8英文数据特征集 POS,Prefixes,Levinclassedmorphological,WordNet,FrameNet,dependencyparse,NomLex-Plus,上下文pvsvs.基于特征向量pvsvs.基于特征向量基于核函数方基于神经网络计算速适用于大规模数据环能很难进一步同时由于树核训练时p受限p受限于pp需要开p关系类型更ppp不p不限定实体类pp给出<中国,美国,俄罗斯>(称为“种子p找出其他国家<德国,英国,法国p基本思种子词与目标词在网页中具有相同或者类似p基本思种子词与目标词在网页中具有相同或者类似的上下p网页结pp上下Step1:种子词🡪🡪模p Step2:模板🡪🡪p候种结p利用不同数据源(例如查询日志、网页文档、知识库文档QueryLogQueryLogPascaCIKM2007)p通过分析种子实例在查询日志中的上下文学得模板,再利用p联想笔记本如p苹果笔记p戴尔笔记如如pPage(WangICDM2007)(1/2)ppp在列表中,种子与目标实体具有相同的网页结开放域实体抽取的主要方法WebPage(WangICDM2007)(2/2)开放域实体抽取的主要方法WebPage(WangICDM2007)(2/2)p爬取模块把种子送到搜索引擎,把返回的前100个网页抓取下来作p抽取模块亦然针对单个网页学习模板,再使用模板抽取候p排序模块利用种子、网页、模板、候选构造一个图,综合考虑网页的质量,使用RandomWalk算法为候选打分并排p针对p针对实例扩展问题,目前缺少公认的评测,研究者在自己构建的数p因为系统输出是一个rankedlist,单纯考察准确率无法体现出的作p采用TREC中常用的MAP(平均正确率均值pPrec:列表中到目前为止的准确率;NewEntity(r)当前实体是否正评价指标与技术水平取前评价指标与技术水平取前100个网页作为语pWang2007在12个自制数据集的结果pp方法一般分为模板抽取和实例候选置信度计算两个模块,pp方法一般分为模板抽取和实例候选置信度计算两个模块,p以无监p•2•24知识表示(Knowledge•Thestorage•Thestorageofknowledge•••Storesrelationandentityinagraphdbbettertellsap知p知识图归纳逻辑编程(InductiveLogic概率图模型(ProbabilisticGraphp马尔科夫逻辑网(MarkovLogicp概率软逻辑(ProbabilisticSoft逻辑推理的基pp表达pp表达能力强,人类可理p可提供精确的结p知识库的规模越来越大,逻辑表示很难高效的扩展到大规模知识库上(p如Freebase)p,pp每个实体、关系的表示是通过优化整个知识库的目标函数编码得到的,它示中,并能够反映在推理过程分布分布式表示学习的流逻辑表 分布式表局全推理的高略效低逻辑表 分布式表局全推理的高略效低高是否易被人理容困难易跨领难(需专家设计种子规则易pppp西兰花含钙,钙能有效预防骨质疏(西兰花可以防止骨质疏松(含有能预防骨质疏松元素的蔬菜能可以防止骨质疏松规则的•••规则的•••(答案事实在规则上化作对SemanticsTree(推理规则钙(关系(关系p通过实体间在图中的链接特征学习关系分类器,得到路径与关系的推理规则[Laoetal,EMNLP2011]p通过实体间在图中的链接特征学习关系分类器,得到路径与关系的推理规则[Laoetal,EMNLP2011]p提取关系的实体对在知识图谱中的链接路径作为特p提取权重大的特征作为该关系的推理路pEfficientpathfindingLongerpaths,pathwithbackwardrandomwalks[Laoetal,ACLp在path上的randomwalk引入约束条件提ppath路径上前后一起搜索提高可解释pPath-ConstrainedRandompPath-ConstrainedRandomCalculatedbydynamicprogrammingorparticlefilteringForwardCalculatingP(s→tForwardCalculatingP(s→t|π)forallpossibleπiseitherveryexpensiveornonexhaustivestWithO(10^2)computationcost,wehave1%chanceoffindingthetargetinaO(10^4)spaceCombineForward&CombineForward&Randomzstpp头部实体和尾部实体都以向量表pp头部实体和尾部实体都以向量表symbolVSvectorRESCAL(Nickle,Tresp,andKriegelpppp分布式表示分布式表示学习方知识库知识库anE的方Foreachtriple(head,relation,tail),learn解asatranslationfromheadtotailppLearningobjective:h+r=ForeachForeachtriple(head,relation,tail),learnrelationasatranslationfromheadtotailComplex1-to-N,N-to-1,N-to-NComplex1-to-N,N-to-1,N-to-N(USA,_president,(USA,_president,≈+TransR(LinetTransR(Linetal.ACLpTransE只能解决1-1的关系,无法处理1-N,N-1,N-ppMr,区分开实体和关系空TransHandBuildTransHandBuildrelation-specificentityWang,etal.(2014).Knowledgegraphembeddingbytranslatingonhyperplanes.链接预测链接预测Path-basedPath-basedLin,etal.(2015).ModelingRelationPathsforRepresentationLearningofKnowledgeBases.Path-basedPath-basedEntityPredictionEntityPredictionLin,etal.(2015).ModelingRelationPathsforRepresentationLearningofKnowledgeBases.PTransE123456789PTransE123456789RelationPath123456789RelationPath123456789RelationPath123456789RelationPath123456789RelationPath123456789RelationPath123456789法(Jietal法(JietalFb15k数据集中37.8%的关系链接的实体对数目不超过个只有29.6%的关系链接超过100知识库中关系的不平衡性知识库中关系的不平衡性p某些关系所链接的头部实体数目和尾部实体数目相差较(1)异质性。一些关系链接很多实体对(复杂关系(1)异质性。一些关系链接很多实体对(复杂关系另一些关系链接很少的实体(简单关系)以往的方法中所有关系的转换矩阵都有相同的自由度(自由变量数目并且头我们认为不同复杂度的关系应当使用不同表达能力的模型学习,并且头•NYT+FB(Weston•NYT+FB(Westonet––––KGcontainsrichinformationbesidesnetworkOptimizescoringSm2r(m,r)=f(m)^T*withfafunctionmappingwordsandfeaturesintoRk,f(m)W^T*Φ(m).WisthematrixofRnv×kcontainingallembeddingsw,Φ(m)isthe(sparse)binaryrepresentationofm(∈Rnv)indicatingabsenceorpresenceofwords/features,andr∈RkistheembeddingoftherelationshiprHolographicEmbeddings(Hole)usesHolographicEmbeddings(Hole)usesthecircularcorrelationtocombinetheexpressivepowerofthetensorproductwiththeefficiencyandsimplicityofTransE••circularcorrelation的操作,whichbeinterpretedasacompressionofthetensorproduct.shareweightsinrpforsemanticallysimilar••RepresentationLearningofKnowledgeGraphsEntityDescriptions.Enhanceentityrepresentationwithdescriptions,modeldescriptionswithCNN•2.5事件知2.5事件知识图p一篇文TaskJointInferenceforEventTimelineConstruction(QuangXuanDoetal.,)ACL2012p一个事BuildingEventThreadsoutofMultipleNewsArticles(XavierTannieretal.,)EMNLP2013pGeneratingEventStorylinesfromMicroblogs(lietMiningtheWebtoPredictFutureEvents(KiraRadinskyetWSDMUsingStructuredEventstoPredictStockPriceMovement:AnEmpiricalInvestigation(Dingetal.,)EMNLP2014pppppp一个事件被定义为=Theevent“Sep3,2013-MicrosoftagreestobuyNokia’smobilephonebusinessfor$7.2”(Actor=Microsoft,Action=buy,Object=Nokia’smobilephonebusiness,Time=Sep3,pp从海量新闻数据中获取结构化的事件信EventPrivatesectoradds114,000pp从海量新闻数据中获取结构化的事件信EventPrivatesectoradds114,000jobsinOpen(privatesector,multiplyclass,114,000BowO1+P,P+O2,+P+Eventppp规范文本🡪🡪有噪音、有冗余的海量网络数p🡪🡪联合抽取、更多的考虑事件各部分之间的影p限定类别🡪🡪p难p事件的体系结构:现有的事件体系结构库都是人工构建,规模较p事件间的关系:事件之间并非相互独立,探索并建模事件之间的关p事件抽取看做多分类问•其他应•其他应AntiCreditExtractfeaturesbothusersideandentity普惠金融,宜信,同Bagga,A.Bagga,A.&Baldwin,B.1998.Entity-basedcross-documentcoreferencingusingthevectorspacemodel.Proceedingsofthe17thinternationalconferenceonComputationallinguistics-Volume1,pp.79-85.M.Banko,M.Cafarella,S.Soderland,M.Broadhead,andO.Etzioni.Openinformationextractionfromtheweb.InIJCAI,2007.Bekkerman,R.&McCallum,A.2005.DisambiguatingwebappearancesofpeopleinasocialProceedingsofthe14thinternationalconferenceonWorldWideWeb,pp.463-D.Bikeletal.EntityLinkingandSlotFillingthroughStatisticalProcessingandInferenceInProceedingofTAC.R.BunescuandM.Pasca.UsingEncyclopedicKnowledgeforNamedEntityDisambiguation.InProceedingofEACL.2006.S.Cucerzan.Large-ScaleNamedEntityDisambiguationBasedonWikipediaData.InProceedingofEMNLP.2007.GuoDongZhou,JianSu,JieZhang,andMinZhang.relationextraction.InProceedingsofACL.2005.ExploringvariousknowledgeP.Buitelaar,P.Buitelaar,P.CimianoandB.Magnini.OntologyLearningfromText:Methods,EvaluationandApplications.InFrontiersinArtificialIntelligenceandApplicationsSeries.2005.S.P.PonzettoandM.Strube.Derivingalargescaletaxonomyfromwikipedia.InAAAIS.P.PonzettoandM.Strube.WikiTaxonomy:Alargescaleknowledgeresource.InECAIS.P.PonzettoandR.Navigli.Large-scaletaxonomymappingforrestructuringandintegratingWikipedia.InIJCAI2009.V.Nastase,M.Strube,B.Boerschinger,C.Zirn,andA.Elghafari.WikiNet:Averylargescalemulti-lingualconceptnetwork.InLREC2010.G.deMeloandG.Weikum.Menta:Inducingmultilingualtaxonomiesfromwikipedia.InCIKMR.NavigliandS.P.Ponzetto.Babelnet:theautomaticconstruction,evaluationandapplicationofawidecoveragemultilingualsemanticnetwork.InArtif.Intell.2012.ZhigangWang,JuanziLiet.al.Cross-lingualKnowledgeValidationBasedTaxonomyDerivationfromHeterogeneousOnlineWikis.InAAAI2014.TrevorFountainandMirellaLapata.TaxonomyinductionusinghierarchicalrandomACLRobertoNavigli,PaolaVelardiandStefanoFaralli.Agraph-basedalgorithmforinducinglexicaltaxonomiesfromscratch.InIJCAI2011.PhilippCimiano,AndreasHothoandSteffenStaab.Learningconcepthierarchiesfromtextcorporausingformalconceptanalysis.InJAIR2005.ZornitsaKozarevaandEduardHovy.AZornitsaKozarevaandEduardHovy.ASemi-SupervisedMethodtoLearnandConstructTaxonomiesusingtheWeb.InEMNLP2010. Graph-AlgorithmforTaxonomyInduction.InCOLINGHuiYangandJamieCallan.Ametric-basedframeworkforautomatictaxonomyinduction.InACL2009.WentaoWu,HongsongLiet.al.Probase:AProbabilisticTaxonomyforTextUnderstanding.InSIGMOD2012.DmitryDavidov,AriRappoportandMoshelKoppel.Fullyunsuperviseddiscoveryofconcept-specificrelationshipsbywebmining.InACL2007.EduardH.Hovy,ZornitsaKozarevaandEllenRiloff.Towardcompletenessinconceptextractionandclassification.InEMNLP2009.ZornitsaKozareva,EllenRiloffandEduardHovy.Semanticclasslearningfromthewebwithhyponympatternlinkagegraphs.InACL2008.PatrickPantelandMarcoPennacchiotti.Espresso:Leveraginggenericpatternsforautomaticallyharvestingsemanticrelations.InACL2006.T.Lee,Z.Wang,H.WangandS.Hwang.Webscaletaxonomycleansing.InVLDB,A.Ritter,S.SoderlandandO.Etzioni.Whatisthis,anyway:Automatichypernymdiscovery.InAAAI2009.I.P.KlapaftisandS.Manandhar.Taxonomylearningusingwordsenseinduction.InProc.NAACL-HLT.ACL,2010.S.Kulkarnietal.CollectiveAnnotationofWikipediaEntitiesinWebText.S.Kulkarnietal.CollectiveAnnotationofWikipediaEntitiesinWebText.InProceedingofHan,X.&Zhao,J.2009.NamedentitydisambiguationbyleveragingWikipediasemantic management,pp.215-224.Han,X.&Zhao,J.2010.StructuralSemanticRelatedness:AKnowledge-BasedMethodtoNamedEntityDisambiguation.ProceedingofACL,pp.50-59.XP.HanandL.Sun.AGenerativeEntity-MentionModelforLinkingEntitieswithKnowledgeBase.InProceedingofACL.2011.XP.Hanetal.CollectiveEntityLinkinginWebText:AGraph-BasedMethod.InProceedingofSIGIR.2011.Entity-Gina- SegmentationandNameEntityRecognition[C].ProceedingsoftheFifthSigHANWorkshoponChineseLanguageProcessing,Sydney:AssociationforComputationalLinguistics,2006:Medelyan,O.,Witten,I.H.Medelyan,O.,Witten,I.H.andMilne,D.(2008)TopicIndexingwithWikipedia.InProceedingsoftheAAAI2008WorkshoponWikipediaandArtificialIntelligence(WIKIAI2008),Chicago,IL.Mihalcea,R.andCsomai,A.(2007)Wikify!:linkingdocumentstoencyclopedicknowledge.InProceedingsofthe16thACMConferenceonInformationandKnowledgemanagement(CIKM’07),Lisbon,Portugal,pp.233-242.Milne,D.andWitten,I.(2008)LearningtolinkwithWikipedia.InProceedingsofthe16thACMConferenceonInformationandKnowledgemanagement(CIKM’08),NapaValley,California,USApp519-529.Pedersen,T.,Purandare,A.&Kulkarni,A.2005.discriminationbyclusteringcontexts.ComputationalLinguisticsandIntelligentTextProcessing,pp.226-YouzhengWu,JunZhao,XuBo,ChineseNamedEntityRecognitionModelBasedonMultipleFeatures.In:ProceedingsoftheJointConferenceofHumanLanguageTechnologyand F.WuandD.Weld.AutonomouslysemantifyingWikipedia.InCIKM,F.WuandD.Weld.OpeninformationextractionusingWikipedia.InACL,ZHAOZHAOJun,LIUFeifan,ProductNamedEntityRecognitioninChineseTexts,InternationalJournalofLanguageResourceandEvaluation(LRE),Vol.42No.2132-152,2008(SCI).W.Zhangetal.EntityLinkingwithEffectiveAcronymExpansion,InstanceSelectionandTopicModeling.InProceedingofIJCAI.2011MinZhang,JieZhang,andJianSu.2006a.Exploringsyntacticfeaturesforrelationextractionusingaconvolutiontreekernel.InProceedingsofHLT/NAACLNIST2005.AutomaticContentExtractionEvaluationOfficialResults[2007-09-28].http://NIST2007.AutomaticContentExtractionEvaluationOfficialResults[2007-09-28].http://PaulMcNamee,OverviewoftheTAC2009KnowledgeBasePopulationTrack,InProceedingsofTACworkshop,2009.HoifungPoon,MarkovLogicforMachineReading,UniversityofWashingtonPh.D.DissertationStefanSchoenmackers,InferenceOvertheWeb,UniversityofWashingtonPh.D.DissertationS.Schoenmackers,J.Davis,O.Etzioni,andD.Weld.LearningFirst-OrderHornClausesfromWebText.InProcs.ofEMNLP,2010.S.Schoenmackers,O.Etzioni,andD.Weld.ScalingTextualInferencetotheWeb.InProcs.ofEMNLP,2008.MaximilianNickel,VolkerMaximilianNickel,VolkerTresp,Hans-PeterKriegel.AThree-WayModelforCollectiveLearningonMulti-RelationalData.InProceedingsofICML,2011.MaximilianNickel,VolkerTresp,Hans-PeterKriegel.FactorizingYAGO:ScalableMachineLearningforLinkedData.InProceedingsofWWW,2012.MaximilianNickel,VolkerTresp.LogisticTensorFactorizationforMulti-RelationalData.InProceedingsofICML,2013a.MaximilianNickel,VolkerTresp.TensorFactorizationforMulti-RelationalLearning.MachineLearningandKnowledgeDiscoveryinDatabases,Springer,2013b.Gabrilovich,E.2015.Areviewofrelationallearningforknowledgegraphs.InProceedingsoftheBordesA.,WestonJ.,CollobertR.,andBengioY.Learningstructuredembeddingsofknowledgebases.InProceedingsofAAAI,2011.pags:301-306.BordesA.,UsunierN.,Garcia-DuranA.TranslatingEmbeddingsforModelingMulti-relationalData.InProceedingsofNIPS,2013.pags:2787-2795.WangZ.,ZhangJ.,FengJ.andChenZ.Knowledgegraphembeddingbytranslatingonhyperplanes.InProceedingsofAAAI,2014.pags:1112-1119.LinY.,ZhangJ.,LiuZ.,SunM.,LinY.,ZhangJ.,LiuZ.,SunM.,LiuY.,ZhuX.LearningEntityandRelationEmbeddingsKnowledgeGraphCompletion.InProceedingsof representationsforopen-textsemanticparsing.InProceedingsofAISTATS,2012.pags:BordesA.,GlorotX.,WestonJ.,andBengioY.Asemanticmatchingenergyfunctionforlearingwithmultirelationaldata.MachineLearning.94(2):pags:233-259.JenattonR.,NicolasL.Roux,BordesA.,andObozinakiG.Alatentfactormodelforhighlymultirelationaldata.InProceedingsofNIPS,2012.pags:3167-3175.SutskeverI.,SalakhutdinovR.andJoshuaB.Tenenbaum.ModelingRelationalDatausingBayesianClusteredTensorFactorization.InProceedingsofNIPS.,2009.pags:1821-1828.SocherR.,ChenD.,ChristopherD.ManningandAndrewY.Ng.ReasoningWithNeuralTensorNetworksforKnowledgeBaseCompletion.InProceedingsofNIPS.,2013.pags:926-934.Ji,G.;He,S.;Xu,L.;Liu,k.;andZhao,J.2015.Knowledgegraphembeddingviadynamicmappingmatrix.InProceedingsofACL,687–696。Guo,S.;Wang,Q.;Wang,B.;Wang,L.;andGuo,L.2015.Semanticallysmoothknowledgegraphembedding.InProceedingsofACL,84–94.Lin,Lin,Y.;Liu,Z.;Luan,H.;Sun,M.;Rao,S.;andLiu,S.2015b.Modelingrelationpathsforrepresentationlearningofknowledgebases.InProceedingsofEMNLP,705–714.Gina-AnneLevow.TheThirdInternationalChineseLanguageProcessingBackoff:WordSegmentationandNameEntityRecognition[C].ProceedingsoftheFifthSigHANWorkshoponChineseLanguageProcessing,Sydney:AssociationforComputationalLinguistics,2006:DavidAhn.2006.Thestagesofeventextraction.InProceedingsoftheWorkshoponAnnotatingandReasoningaboutTimeandEvents,pages1–8.AssociationforComputationalLinguistics.YuHong,JianfengZhang,BinMa,JianminYao,GuodongZhou,andQiaomingZhu.2011.Usingcross-entityinferencetoimproveeventextraction.InProceedingsofthe49thAnnualMeetingoftheAssociationforComputationalLinguistics:HumanLanguageTechnologies-Volume1,pagesHengJiandRalphGrishman.2008.Refiningeventextractionthroughcross-documentInACL,pages254–262.QiLi,HengJi,andLiangHuang.2013.Jointeventextractionviastructuredpredictionwithglobalfeatures.InACL(1),pages73–82.QiQiLi,HengJi,YuHong,andSujianLi.2014.Constructinginformationnetworksusingonesinglemodel.InProc.the2014ConferenceonEmpiricalMethodsonNaturalLanguageProcessingShashaLiaoandRalphGrishman.2010.Usingdocumentlevelcross-eventinferencetoimprove ComputationalLinguistics,pages789–797.AssociationforComputationalLinguistics.DaojianZeng,KangLiu,SiweiLai,GuangyouZhou,andJunZhao.2014.Relationclassificationviaconvolutionaldeepneuralnetwork.InProceedingsofCOLING,pages2335–2344.ShizhuHe,KangLiu,YuanzheZhangandJunZhao,QuestionAnsweringoverLinkedDataUsingFirstOrderLogic,EMNLP2014.Zeng,Daojian,etal."DistantSupervisionforRelationExtractionviaPiecewiseConvolutionalNeuralNetworks."EMNLP,2015.ZhuoyuWei,JunZhao,KangLiu,ZhenyuQi,etal.Large-scaleKnowledgeBaseCompletion:InferringviaGroundingNetworkSamplingoverSelectedInstances,CIKM2015.ShizhuHe,KangLiu,JunZhao,LearningtoRepresentKnowledgeGraphswithGaussianEmbedding,CIKM2015.JiGuoliang,JiGuoliang,HeShizhu,XuLiheng,LiuKangandZhaoJun,KnowledgeGraphEmbeddingviaDynamicMappingMatrix,ACL2015.ShulinLiu,KangLiuandJunZhao,AProbabilisticSoftLogicBasedApproachtoExploitLatentandGlobalInformationinEventClassification,AAAI2016GuoliangJi,ShizhuHe,KangLiuandJunZhao,KnowledgeGraphCompletionwithAdaptiveSparseTransferMatrix,AAAI2016YuanzheZhang,ShizhuHe,KangLiuandJunZhao,AJointModelforQuestionAnsweringoverMultipleKnowledgeBases,AAAI2016徐从富,郝春苏保君楼俊杰.马尔科夫逻辑网络研究.软件学报机交互技术评测:命名实体评测结果报告[R].北京:863计划中文信息处理与智能人机接口技术评测组,2004吴友政.问答系统关键技术研究 中国科学院自动化研究所博士论文 [Zelle,1995]J.M.ZelleandR.J.Mooney,“Learningtoparsedatabasequeriesusinginductivelogicprogramming,”[Zelle,1995]J.M.ZelleandR.J.Mooney,“Learningtoparsedatabasequeriesusinginductivelogicprogramming,”inProceedingsoftheNationalConferenceonArtificialIntelligence,1996,pp.1050–1055.[Wong,2007]Y.W.WongandR.J.Mooney,“Learningsynchronousgram-marsforsemanticparsingwithlambdacalculus,”inProceedingsofthe45thAnnualMeeting-AssociationforcomputationalLinguistics,[Lu,2008]W.Lu,H.T.Ng,W.S.Lee,andL.S.Zettlemoyer,“Agenerativemodelforparsingnaturallanguagetomeaningrepresentations,”inProceedingsofthe2008ConferenceonEmpiricalMethodsinNaturalLanguageProcessing,2008,pp.783–792.[Zettlemoyer,2005]L.S.ZettlemoyerandM.Collins,“Learningtomapsentencestologicalform:Structuredclassificationwithprobabilisticcategorialgrammars,”inProceedingsofthe21stUncertaintyinArtificialIntelligence,2005,pp.658–666.[Clarke,2010]J.Clarke,D.Goldwasser,M.-W.Chang,andD.Roth,“Drivingsemanticparsingfromtheworld’sresponse,”inProceedingsofthe14thConferenceonComputationalNaturalLanguageLearning,2010,pp.18–27[Liang,2011]P.Liang,M.I.Jordan,andD.Klein,“Learningdependency-basedcompositionalsemantics,”inProceedingsofthe49thAnnu

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论