




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
百科和佛学知识图谱构建技术介绍
漆桂林东南大学认知智能研究所Schedule
of
My
Talk百科知识图谱构建技术佛学知识图谱构建技术IntroductionofKnowledgeBasesWhatisknowledge?Facts,information,descriptions,orskillsAcquiredthroughexperienceoreducationbyperceiving,discovering,orlearningKnowledgebase:anorganizedrepositoryofknowledgeconsistingofconcepts,instances,relations(properties),facts,rulesetc.Isaprincipalpartofexpertsystems“thepowerofanAIprogramcametobeseenaslargelyinitsknowledgebase”EdwardFeigenbaum,1994ACMTuringAwardDevelopmentofKnowledgeBaseinRecentDecades1985199019952000(#$capitalCity#$France#$Paris)student
enrollee
person35millionarticlesin288differentlanguages…15thousandconcepts600millioninstances20billionfacts200520102012NELLGoogle
Knowledge
Graph
(KG)It
isanewgenerationofintelligentsearchtechnology,whichenablesyoutosearchforthings,notstringsFormal
definition:
a
knowledge
graph
is
a
knowledge
base
with
graph
structure,
where
the
nodes
are
instances
or
concepts,
and
edges
are
relations
between
themIt
is
a
special
semantic
networkIt
belongs
to
knowledge
engineering中兴通讯上市公司非上市公司子公司中兴康讯Acacia(IPO中)卓翼科技美国高通共进股份宇顺电子美国博通供应商客户竞争对手合作伙伴中国移动英特尔华为中国联通大富科技华星创业盛路通信超声电子ExampleKG
and
Semantic
Search
Go
deeper
and
broaderTechnologiesofKnowledgeBaseConstructionBaiduHudongZh-WikipediaKnowledge
Graph
(KG)ConstructionfromOnlineEncyclopediasWell-knownopenknowledgegraphssuchasDBpedia,YagoandZhishi.mearebuiltfromonlineencyclopedias.Technologies
ofencyclopedicknowledgegraphconstruction:DataextractionEntitymatchingTypeinferenceZhishi.meZhishi.me(http://zhishi.me)isthefirstefforttopublishlargescaleChinesesemanticdataandlinkthemtogetherasaChineseLinkingOpenData(CLOD).OverviewofZhishi.meCurrently,itconsistsofstructureddataextractedfromthreelargestChineseencyclopediasites:BaiduBaikeHudongBaikeChineseWikipediaItnow
has
over
10
milliondistinctinstancesand200millionRDFtriples,
and
can
be
accessed
by
online
API,
lookup
service
and
SPARQL
endpoint.LabelsAbstractsRedirectsImagesrdfs:labelzhishi:abstractrdfs:commentdbpedia:abstractzhishi:pageRedirectszhishi:thumbnailDataExtractionXingNiu,XinruoSun,HaofenWang,ShuRong,GuilinQi,YongYu:Zhishi.me-WeavingChineseLinkingOpenData.ISWC2011:205-220infoboxPropertieshttp://zhishi.me/[sourceName]/property/[propertyName]http://zhishi.me/baidubaike/property/中文名称“南京”@zhDataExtractionInternalLinkszhishi:internalLinkzhishi:categoryskos:broaderDataExtractionEntityMatchingBaidu:北京Zh-Wiki:北京市EquivalententitiesEntityMatchingAutomaticallydiscoveringandrefiningdataset-specificmatchingrulesiniterationsDerivingtheserulesbyfindingthemostdiscriminativedatacharacteristicsforagivendatasourcepair,
e.g.(baidu:北京,Zh-wiki:北京市).From
Haofen
WangForeachpairofexistingmatchedinstances,theirproperty-valuepairsaremerged.ValuesProperty_1Property_2“大熊猫”baidu:标签hudong:中文学名“Ailuropodamelanoleuca”baidu:拉丁学名hudong:二名法“白鳍豚”baidu:标签hudong:中文学名“桂花”baidu:标签hudong:中文学名………EntityMatchingFrom
Haofen
WangMatchingrule(frequentsetmining):baidu:xandhudong:xarematched,iff.valueOf(baidu:标签)=valueOf(hudong:中文学名)andvalueOf(baidu:拉丁学名)=valueOf(hudong:二名法)andvalueOf(baidu:纲)=valueOf(hudong:纲)EntityMatchingFrom
Haofen
WangApplyingtheobtainedrule(s)ontheunlabeleddatatogeneratematches’candidates.Thecombinerisusedtocombineconfidencevaluesofamatch’scandidate.EntityMatchingFrom
Haofen
WangType
InferenceTypeinformationstatingthataninstanceisofacertaintype(e.g.Chinaisaninstanceofcountry)isanimportantcomponentofknowledgebasesGivenanapplication
scenario—QuestionAnswering.Question:WhoistheNobellaureateinliteratureofpeople’s
republicofChina?Answer:Moyan.Howtogettheanswer?
MoyanInstanceOf
Nobellaureateofpeople’srepublicofChinaTianxingWu,ShaoweiLing,GuilinQi,HaofenWang:MiningTypeInformationfromChineseOnlineEncyclopedias.JIST2014:213-229The4th
JointInternationalSemanticTechnologyConferenceApproach
InChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.
“TimBerners-Lee”hasseveralcategories:“Englishcomputerscientists”,“PeopleassociatedwithCERN”,“EnglishexpatriatesintheUnitedStates”,“LivingPeople”,“WorldWideWebConsortium”
The4th
JointInternationalSemanticTechnologyConferenceApproachInChineseonlineencyclopedias,wediscoverthatlotsoffine-grainedtypesexistincategoriesofarticlepages.
Givenanexample:
Giventhearticlepagesof“China”inBaiduBaike,Hudong
BaikeandChineseWikipedia,itscategoriesareasfollows:
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)Wetakethecategoriesofonegiveninstanceasitscandidatetypesandtrytofilteroutthenoiseleveragingtheattributes.
Intuitively,whengivenattributesofacertaininstanceasfollows:
“actors,releasedate,director”
aninstanceof“Movie”
“name,foreignname”aninstanceof“?”Weassumethatifaninstancecontainstherepresentativeattributesofonecandidatetype,theinstanceprobablybelongstothistype.
Butanotherproblemis:categoryattributesarenotabundantlyavailable.
The4th
JointInternationalSemanticTechnologyConferenceApproach(cont.)ExplicitIsARelationDetector:DetectexplicitinstanceOfandsubclassOfrelationsCategoryAttributesGenerator:
GenerateattributesforcategorieswithanattributepropagationalgorithmInstanceTypeRanker:
Rankcandidatetypeswithagraph-basedrandomwalkmethod
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfrominfoboxes
I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetThe4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfrominfoboxes
I={i1,i2,…,in}aninstancesetallarticlesallarticlecategoriesC={c1,c2,…,cm}aconceptsetAttributevaluea1v1a2v2……infobox{<a1,v1>…,<ak,vk>}anAVPsetvkinstanceOfakExample:<director,StevenSpielberg>
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObjectThe4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitInstanceOfRelationDetection
MiningexplicitinstanceOfrelationfromabstracts
performdependencyparsingwithFudanNLP[Qiuetal.,2013]SubjectPredicateObject迈克尔·乔丹instanceOf篮球运动员MichaelJeffreyJordanBasketballPlayer
The4th
JointInternationalSemanticTechnologyConferenceExplicitIsARelationDetectorExplicitSubclassOfRelationDetection
GeneratecandidateSubclassOfcategorypairsintheformof(sub-category,category)basedonthecategorysystem.Checkwhetherthe(sub-category,category)sharethesamelexicalhead
withPOStagging.
Foreach(sub-category,category),checkwhetherthecategoryisaparentconceptofthesub-categoryinZhishi.schema[Wangetal.,2014]江苏学校(schoolinJiangSu)subclassOf中国学校(schoolinChina)The4th
JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorWetakeattributesininfoboxtemplatesasexistingcategoryattributes
andattributesininfoboxofarticlepagesasinstanceattributes.
WeconstructaCategoryGraphcomposedofallcategorieswithsubclassOfrelations.
WepropagateattributesovertheCategoryGraphleveragingexistingcategoryattributes,instanceattributes,identifiedinstanceOfandsubclassOfrelations.The4th
JointInternationalSemanticTechnologyConferenceCategoryAttributesGeneratorTheattributepropagationalgorithmarebasedonfollowingrules:Rule1:Ifacategorychasattributesfrominfoboxtemplates,theseattributesshouldremainunchanged.Rule2:Ifacategorychassomeinstanceswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfoftheseinstances.Rule3:Ifacategorychassomechildcategorieswithattributes,theattributesshouldbepropagatedtocwhentheyaresharedbymorethanhalfofthesechildcategories.Rule4:Ifparentcategoriesofacategorychaveattributes,alltheattributesshouldbeinheritedbyc.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeorganizeeachgiveninstance,itsattributesandcategories(i.e.candidatetypes)ofthecorrespondingarticlepageintoanInstanceGraph.WegroupsynonymousattributeswithBabelNetbeforeconstructingallInstanceGraphs.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerThe4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRankerWeassumethatthefewercategoriesanattributebelongsto,themorerepresentativetheattributeis.The4th
JointInternationalSemanticTechnologyConferenceInstanceTypeRanker
Whenexecutingarandomstepfromthegiveninstancetooneofitsattributes,thewalktendstochoosethemostrepresentativeattributeinordertowalktothecorrectcategories.Whenexecutingarandomstepfromanattributetotheoneofthecategoriesinthearticlepage,thecategoriescontainingthisattributehaveequalopportunity.The4th
JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",or"Unknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th
JointInternationalSemanticTechnologyConferenceExperimentAccuracyEvaluationWerandomlyselect500(category,attribute)pairsfromeachonlineencyclopediaand500typestatementsfromdifferentsourcesineachonlineencyclopedias.Weinvitesixpostgraduatestudentswhoarefamiliarwithlinkeddatatolabeltheeachsamplementionedabovewith"Correct","Incorrect",orUnknown".Togeneralizefindingsoneachsampletothewholedataset,wecomputetheWilsonintervals[Brownetal.,2001]for=5%.The4th
JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)The4th
JointInternationalSemanticTechnologyConferenceExperimentComparisonwithOtherKnowledgeBases(OverlapofTypeinformation)WecomparealltheobtainedChinesetypeinformationwiththatofotherwell-knownknowledgebases,namelyDBpedia,YagoandBabelNet.SinceDBpediaandYagohavemultilingualversions,wemappedtheEnglishtypestatementstoChineseones(bothinstanceandtypeinonetypestatementcanbemappedtotheChineselabels)TechnologiesofKnowledgeBaseConstructionWebAccesstoZhishi.me
http://zhishi.me/apiSchedule
of
My
Talk百科知识图谱构建佛学知识图谱构建Framework(takeBuddhistfiguresastheexample)KnowledgeCollectionCategory方法人工观察百科中与佛教人物相关的分类抽取佛教人物分类下所有文章对应的实体命名规则方法
例:
“.+菩萨”“.+禅师”维基百科“佛教头衔”分类下的所有实体已抽取出的实体名中高频的公共字符串KnowledgeFusion主语融合实体的“别名”属性和重定向作为实体的别名集合不同来源的实体存在一个完全匹配的别名则认为是相同实体人工检查相同实体数多于三个的映射百度百科:互动百科:维基百科:{确吉坚赞,班禅额尔德尼·确吉坚赞,罗桑赤烈伦珠}{班禅额尔德
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025至2030年中国怡开数据监测研究报告
- 低空经济的产业链分析
- 储氢系统设计与应用
- 2025年度村委会林地承包经营权流转合同
- 中考作文指导:《给“普通素材”加点“料”》教学设计
- 二零二五年度房屋置换与社区养老服务体系合同
- 二零二五年度酒店客房安全免责协议及宾客责任说明
- 二零二五年度钻机销售及配件供应合作协议
- 二零二五年度财务人员离职保密条款及离职后保密协议
- 《推理》教学设计-2024-2025学年二年级下册数学人教版
- 血液透析诱导期健康宣教
- 第十六章二次根式单元复习题-2023-2024学年人教版八年级数学下册
- 2023-2024新版北师大七年级数学下册全册教案
- 风电场升压站培训课件
- 无人机固定翼行业报告
- 小区门窗拍摄方案
- 初中历史期中考试分析报告
- 企业反商业贿赂法律法规培训
- 2023合同香港劳工合同
- 玻璃体腔注射-操作流程和注意事项(特选参考)课件
- 材料化学课件
评论
0/150
提交评论