知识图谱构建技术-北理工_第1页
知识图谱构建技术-北理工_第2页
知识图谱构建技术-北理工_第3页
知识图谱构建技术-北理工_第4页
知识图谱构建技术-北理工_第5页
已阅读5页,还剩52页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

百科和佛学知识图谱构建技术介绍

漆桂林东南大学认知智能研究所Schedule

of

My

Talk百科知识图谱构建技术佛学知识图谱构建技术IntroductionofKnowledgeBasesWhatisknowledge?Facts,information,descriptions,orskillsAcquiredthroughexperienceoreducationbyperceiving,discovering,orlearningKnowledgebase:anorganizedrepositoryofknowledgeconsistingofconcepts,instances,relations(properties),facts,rulesetc.Isaprincipalpartofexpertsystems“thepowerofanAIprogramcametobeseenaslargelyinitsknowledgebase”EdwardFeigenbaum,1994ACMTuringAwardDevelopmentofKnowledgeBaseinRecentDecades1985199019952000(#$capitalCity#$France#$Paris)student

enrollee

person35millionarticlesin288differentlanguages…15thousandconcepts600millioninstances20billionfacts200520102012NELLGoogle

Knowledge

Graph

(KG)It

isanewgenerationofintelligentsearchtechnology,whichenablesyoutosearchforthings,notstringsFormal

definition:

a

knowledge

graph

is

a

knowledge

base

with

graph

structure,

where

the

nodes

are

instances

or

concepts,

and

edges

are

relations

between

themIt

is

a

special

semantic

networkIt

belongs

to

knowledge

engineering中兴通讯上市公司非上市公司子公司中兴康讯Acacia(IPO中)卓翼科技美国高通共进股份宇顺电子美国博通供应商客户竞争对手合作伙伴中国移动英特尔华为中国联通大富科技华星创业盛路通信超声电子ExampleKG

and

Semantic

Search

Go

deeper

and

broaderTechnologiesofKnowledgeBaseConstructionBaiduHudongZh-WikipediaKnowledge

Graph

(KG)ConstructionfromOnlineEncyclopediasWell-knownopenknowledgegraphssuchasDBpedia,YagoandZhishi.mearebuiltfromonlineencyclopedias.Technologies

ofencyclopedicknowledgegraphconstruction:DataextractionEntitymatchingTypeinferenceZhishi.meZhishi.me(http://zhishi.me)isthefirstefforttopublishlargescaleChinesesemanticdataandlinkthemtogetherasaChineseLinkingOpenData(CLOD).OverviewofZhishi.meCurrently,itconsistsofstructureddataextractedfromthreelargestChineseencyclopediasites:BaiduBaikeHudongBaikeChineseWikipediaItnow

has

over

10

milliondistinctinstancesand200millionRDFtriples,

and

can

be

accessed

by

online

API,

lookup

service

and

SPARQL

endpoint.LabelsAbstractsRedirectsImagesrdfs:labelzhishi:abstractrdfs:commentdbpedia:abstractzhishi:pageRedirectszhishi:thumbnailDataExtractionXingNiu,XinruoSun,HaofenWang,ShuRong,GuilinQi,YongYu:Zhishi.me-WeavingChineseLinkingOpenData.ISWC2011:205-220infoboxPropertieshttp://zhishi.me/[sourceName]/property/[propertyName]http://zhishi.me/baidubaike/property/中文名称“南京”@zhDataExtractionInternalLinkszhishi:internalLinkzhishi:categoryskos:broaderDataExtractionEntityMatchingBaidu:北京Zh-Wiki:北京市EquivalententitiesEntityMatchingAutomaticallydiscoveringandrefiningdataset-specificmatchingrulesiniterationsDerivingtheserulesbyfindingthemostdiscriminativedatacharacteristicsforagivendatasourcepair,

e.g.(baidu:北京,Zh-wiki:北京市).From

Haofen

WangForeachpairofexistingmatchedinstances,theirproperty-valuepairsaremerged.ValuesProperty_1Property_2“大熊猫”baidu:标签hudong:中文学名“Ailuropodamelanoleuca”baidu:拉丁学名hudong:二名法“白鳍豚”baidu:标签hudong:中文学名“桂花”baidu:标签hudong:中文学名………EntityMatchingFrom

Haofen

WangMatchingrule(frequentsetmining):baidu:xandhudong:xarematched,iff.valueOf(baidu:标签)=valueOf(hudong:中文学名)andvalueOf(baidu:拉丁学名)=valueOf(hudong:二名法)andvalueOf(baidu:纲)=valueOf(hudong:纲)EntityMatchingFrom

Haofen

WangApplyingtheobtainedrule(s)ontheunlabeleddatatogeneratematches’candidates.Thecombinerisusedtocombineconfidencevaluesofamatch’scandidate.EntityMatchingFrom

Haofen

WangType

InferenceTypeinformationstatingthataninstanceisofacertaintype(e.g.Chinaisaninstanceofcountry)isanimportantcomponentofknowledgebasesGivenanapplication

scenario—QuestionAnswering.Question:WhoistheNobellaureateinliteratureofpeople’s

republicofChina?Answer:Moyan.Howtogettheanswer?

MoyanInstanceOf

Nobellaureateofpeople’srepublicofChinaTianxingWu,ShaoweiLing,GuilinQi,HaofenWang:MiningTypeInformationfromChineseOnlineEncyclopedias.JIST2014:213-229The4th

JointInternationalSemanticTechnologyConferenceApproach

InChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.

“TimBerners-Lee”hasseveralcategories:“Englishcomputerscientists”,“PeopleassociatedwithCERN”,“EnglishexpatriatesintheUnitedStates”,“LivingPeople”,“WorldWideWebConsortium”

The4th

JointInternationalSemanticTechnologyConferenceApproachInChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.

Givenanexample:

Giventhearticlepagesof“China”inBaiduBaike,Hudong

BaikeandChineseWikipedia,itscategoriesareasfollows:

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.

Intuitively,whengivenattributesofacertaininstanceasfollows:

“actors,releasedate,director”

aninstanceof“Movie”

“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.

Butanotherproblemis:categoryattributesarenotabundantlyavailable.

The4th

JointInternationalSemanticTechnologyConferenceApproach(cont.)ExplicitIsARelationDetector:DetectexplicitinstanceOfandsubclassOfrelationsCategoryAttributesGenerator:

GenerateattributesforcategorieswithanattributepropagationalgorithmInstanceTypeRanker:

Rankcandidatetypeswithagraph-basedrandomwalkmethod

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfrominfoboxes

I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetThe4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfrominfoboxes

I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetvkinstanceOfakExample:<director,StevenSpielberg>

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObjectThe4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection

MiningexplicitinstanceOfrelationfromabstracts

performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject迈克尔·乔丹instanceOf篮球运动员MichaelJeffreyJordanBasketballPlayer

The4th

JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitSubclassOfRelationDetection

GeneratecandidateSubclassOfcategorypairsintheformof(sub-category,category)basedonthecategorysystem.Checkwhetherthe(sub-category,category)sharethesamelexicalhead

withPOStagging.

Foreach(sub-category,category),checkwhetherthecategoryisaparentconceptofthesub-categoryinZhishi.schema[Wangetal.,2014]江苏学校(schoolinJiangSu)subclassOf中国学校(schoolinChina)The4th

JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorWetakeattributesininfoboxtemplatesasexistingcategoryattributes

andattributesininfoboxofarticlepagesasinstanceattributes.

WeconstructaCategoryGraphcomposedofallcategorieswithsubclassOfrelations.

WepropagateattributesovertheCategoryGraphleveragingexistingcategoryattributes,instanceattributes,identifiedinstanceOfandsubclassOfrelations.The4th

JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorTheattributepropagationalgorithmarebasedonfollowingrules:Rule1:Ifacategorychasattributesfrominfoboxtemplates,theseattributesshouldremainunchanged.Rule2:Ifacategorychassomeinstanceswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfoftheseinstances.Rule3:Ifacategorychassomechildcategorieswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfofthesechildcategories.Rule4:Ifparentcategoriesofacategorychaveattributes,alltheattributesshouldbeinheritedbyc.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeorganizeeachgiveninstance,itsattributesandcategories(i.e.candidatetypes)ofthecorrespondingarticlepageintoanInstanceGraph.WegroupsynonymousattributeswithBabelNetbeforeconstructingallInstanceGraphs.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerThe4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeassumethatthefewercategoriesanattributebelongsto,themorerepresentativetheattributeis.The4th

JointInternationalSemanticTechnologyConferenceInstanceTypeRanker

Whenexecutingarandomstepfromthegiveninstancetooneofitsattributes,thewalktendstochoosethemostrepresentativeattributeinordertowalktothecorrectcategories.Whenexecutingarandomstepfromanattributetotheoneofthecategoriesinthearticlepage,thecategoriescontainingthisattributehaveequalopportunity.The4th

JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",or"Unknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th

JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",orUnknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th

JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)The4th

JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)TechnologiesofKnowledgeBaseConstructionWebAccesstoZhishi.me

http://zhishi.me/apiSchedule

of

My

Talk百科知识图谱构建佛学知识图谱构建Framework(takeBuddhistfiguresastheexample)KnowledgeCollectionCategory方法人工观察百科中与佛教人物相关的分类抽取佛教人物分类下所有文章对应的实体命名规则方法

例:

“.+菩萨”“.+禅师”维基百科“佛教头衔”分类下的所有实体已抽取出的实体名中高频的公共字符串KnowledgeFusion主语融合实体的“别名”属性和重定向作为实体的别名集合不同来源的实体存在一个完全匹配的别名则认为是相同实体人工检查相同实体数多于三个的映射百度百科:互动百科:维基百科:{确吉坚赞,班禅额尔德尼·确吉坚赞,罗桑赤烈伦珠}{班禅额尔德

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论