版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
知识图谱的集成 计算机科学与软件新技术国CCKS2016讲习班,提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata 2Semantic SemanticWebwasathoughtfromTimBerners- GiveformalmeaningstoWebinformation– Web1.0(page)àWeb2.0(social)àWeb3.0(awebof SemanticWebiscommonformats integrationandcombinationofdrawnfromdiverselanguages recordinghowthedatarelatestoreal-worldobjects3 RDF
谓主 宾
LayerTheworldisnotmadeofstrings,butismadeofthings4Linkeddata 数据/关联数据 AsarealizationofSemantic LinkedDatareferstoacollectionofinterrelated Usedforlarge-scaleintegrationof,reasoningon,dataonthe LinkeddataUseURIstonameUseHTTPURIs(canbeProvideusefulinformationusingopenWebstandards(e.g.Includelinkstootherrelated5Linkedopendata(LOD)1,000+
lifesocial
6Knowledge KnowledgeGraphisaknowledgebaseusedby toenhanceitssearchengine’ssearchresultswithsemantic-searchinformationgatheredfromawidevarietyofsources¡知识图谱是使用的一个知识库, 亦可看作是一张巨大的图,节点表实体或概念,边则由属性或关系 除了关 (部分)真实世界的一个模 引入领域相关的 指定术语的含义(语义 使用合适的逻辑来形 描述 HeartisamuscularorganispartofthecirculatoryI.Horrocks.Ontologiesandthesemanticweb:thestorysofar. 大规模知识库/图谱规英文:4百万个实体,5亿个RDF三元125种1千万个实体,1.2亿个RDF三元4千万个实体,10亿个RDF三元 知识图谱6亿个实体,35亿条RDF三元WolframAlpha计算知识引擎,CMUNELL,知心,搜狗知立9知识图谱的技术族知识体已有知识 知识图谱提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Sincelonglongtimes SyntacticSchema- e.g.,“WeiHu”vs.Schema- Terminological e.g.,“notebook”vs.Data-entityData-entity Pragmatic OntheSemantic Datahasexplicitsemantics,richlinks,Ontology Thepopularityofontologiesisrapidlygrowing,andthenumberofontologiescontinuesincreasing Ontology Theprocessofdeterminingcorrespondencesbetween 本体匹配即发现一个三元组𝛥𝑂𝑂𝑀>,包括一个源本体𝑂,一个目标本体𝑂’,以及一个映射单元的集合𝑀={𝑚1𝑚2𝑚𝑛}。其中,𝑚𝑖表示一个基本的映射元,可以写成𝑚𝑖=<𝑖𝑑,𝑡𝑡𝑠>的四元 𝑖𝑑为映射单元的标识符,用于唯一标识该四元 𝑡,𝑡’分别为𝑂,𝑂’中的术 𝑠表示𝑡𝑡’之间的相似度,满足𝑠//另外,可以有𝒓表示𝒕,𝒕’之间的关系,常见的关系有等本体匹配:消除模式 (驱动的)Stateofthe语言学特征 本体中术语的语言学描 本地名(localnForanameNinanamespaceidentifiedbyaURII,thenamespacenameisI.ForanameNthatisnotinanamespace,thenamespacenamehasnovalue.Definition:IneithercasethelocalnameisN.n -->local 注释 其他:foaf:name、dc:title语言学特征 本体语言学特征使用现状的调 本地名使用多,有一些 注 邻居未充 词典查询耗√√√√√√类√√机器学√排序、S-类√ Edit 指两个字串之间,由一个转成另一个所需的最少编辑操作次 编辑操作包括替换、插入、删 一般来说,编辑距离越小,两个字串的相似度越 I-Sub:𝑆𝑖𝑚(𝑠1,𝑠2)=𝐶𝑜𝑚𝑚(𝑠1,𝑠2)−𝐷𝑖𝑓𝑓(𝑠1,𝑠2)+𝑤𝑖𝑛𝑘𝑙𝑒𝑟(𝑠1, biggestcommonsubstringtwo thelengthofunmatchedresultedfrominitialmatching 术语的语言学描述:本地名 、注 结点的语言学描述:前向邻居的语言学描 术语的邻居:主语邻居、谓语邻居、宾语邻 术语的虚拟文档:自身+𝑣𝑑𝑜𝑐 =𝑑𝑒𝑠𝑐 +𝛾3𝑑𝑒𝑠𝑐 +𝛾1 向量空间模型:TF-Stringsimilaritymetrics Lessthantwowordsperlabel:Jaro- Twoormorewordsper Synonyms:SoftJaccard,withLevensteinbase Nosynonyms:SoftJaccard,withLevensteinbase Lessthantwowordsperlabel:TF- Twoormorewordsper Synonyms:SoftTF-IDF,withJaro-Winklerbase DifferentLanguages:SoftTF-IDF,withJaro-Winklerbase Other:SoftTF-IDF,withJaro-Winklerbase结构特征 Intuition:termsoftwodistinctontologiesaresimilarwhenadjacenttermsarennSimilarityℴ^_`𝑥, =ℴ^𝑥, +
ij,k,lcl,k,ir
ℴ^(𝑎e,𝑏e)g𝑤(𝑎e,𝑏e,(𝑥,ℴ^(𝑎q,𝑏q)g𝑤(𝑎q,𝑏q,(𝑥,实例数据 Machine Jointprobability Instance Content Name Meta Relaxation
搜索引擎 distance sbetween -basedsimilarity𝑁𝐺𝐷𝑥, =maxlog𝑓𝑥,log𝑓 −log𝑓(𝑥,log𝑀−min{log𝑓𝑥,log 𝑓 isthenumber hitsforthesearchterm 𝑓 isthenumber hitsforthesearchterm 𝑓𝑥, isthenumber hitsforthetupleofsearchterms𝑥 𝑀isthenumberofwebpagesindexed (𝑀≈10`x)Ontologymatching Falcon- New Alotof(semi-)automaticalgorithmsand Mostareonlyapplicableforsmall ManyapplicationsrequirematchingBIG Medicineandbiology:GALEN,FMA, Agricultureandfood:AGROVOC, Librarycollections:Brinkman, Commonknowledge:DBpedia,
≥10K Adivide-and-conquer1.ontologypartitioningà2.blockmatchingà3.termRunningNewdirectionsnHolisticontologynIncreasingamountofdataàsimultaneouslymatchingnInput:asetΩ={𝑂1,…,𝑂𝑁}ofontologieswith𝑁>2nOutput:𝐴=𝐴12∪𝐴13∪𝐴23∪⋯nGuaranteetofindalwaysthesameAglobaloptimal Limitationofpairwise 𝐴isconsideredasalocalsolutiondependingoftheorderwhichtheontologymatchingiscarried e.g.𝐴12∪𝐴`}~≠𝐴13∪𝐴`~}≠𝐴23∪𝐴}~Holisticontology Extending um-weightedgraphmatchingproblemwithconstraints(cardinality,structuralandcoherence Threetypesof Class,objectproperty,data Representvirtualconnectionsbetweenthesametypesof Haveweightstorepresentsimilaritiesbetweenthe Correspondences(1:1)with umweightßà Linearconstraints:binary Classdecision disjoint 提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Entity SemanticWebdatahavereachedascaleinbillionsof Manydifferententitiesrefertothesamereal-world TypicallydenotedbyURIs,fromdistributeddata e.g.Wei¡ Entitylinkage:linkdifferententitiesthatrefertothesame a.k.a.coreferenceresolution,entitymatching,recordlinkage Theentitymatchingproblemwasoriginallydefinedin1959by beetal.andwasformalizedbyFelligiandSuntertenyearslater Outof31BRDFstatements,lessthan500Marelinksacross 实 的识 数 的消 消除描述这些标识符RDF数据之StateStateofthe Stateofthe InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current owl:sameAs,inversefunctionalpropertiesSimilaritycomputation(alsointhedatabase ComparepropertiesandvaluesofEquivalence AnRDFtriple:⟨𝑠,𝑝,𝑜⟩∈(𝐔∪𝐁)×𝐔×(𝐔∪𝐁∪ Same-asrelation: ⟨𝑠,owl:sameAs,𝑜⟩à⟨𝑠,𝑜⟩∈𝑆and⟨𝑜,𝑠⟩∈ Inversefunctionalproperty(IFP)relation: IFP:avaluecanonlybethevalueofthispropertyforasingle e.g.,𝑠1,foaf:mbox,𝑜,𝑠2,foaf:mbox,𝑜à⟨𝑠1,𝑠2⟩∈𝐼and⟨𝑠2,𝑠1⟩∈ Functionalproperty(FP)relation: Cardinalityrelation: owl:cardinality/owl:maxCardinality= 𝐾=𝑆∪𝐼∪𝐹∪𝐶+,𝐾isanequivalenceSimilarity Similarity LinkSimilarity 问题一般为以下形𝑟,𝑠 𝑠𝑖𝑚𝑟,𝑠>𝜏,𝑟∈𝑅,𝑠∈𝑆 𝑅和𝑆是两个字符串集合,𝜏是相似度 时间复杂度为:𝑂(𝑛}𝑁}) 现有的常规的方法是“过滤—验证”框 过滤阶段:使用各种过滤方法缩小候选集大 常见方 All-Pairs,ED-Join,PPJoin,PassJoin Naïvepairwise:𝑁}pairwise 1,000businesslistingseachfrom1,000differentcitiesacrossthe 1trillioncomparisons,11.6days(ifeachcomparisonis1 Mentionsfromdifferentcitiesareunlikelytobe Blockingcriterion: 1billioncomparisons,16minutes(ifeachcomparisonis1 Hashbased Pairwisesimilarity/neighborhoodbasedblocking Simpleblocking:invertedMachine Alinkage Learning Genetic ActiveSelectslinkcandidatestobelabeledbyaAhumanexpertlabelstheselectedlinkascorrectorincorrectThegeneticprogrammingalgorithmevolvesthepopulationoflinkagerules InLOD,millionsofentitieshavealreadybeen However,potentialcandidatesarestill Current Atpresent,probablymissmanypotentialSimilarity Toimprove,machine Time-consuming,labor-intensivetobuildalarge-scaletrainingDefinitionDefinition1.LetUbethesetofentitiesinasetDofdatasources.Given,theentitylinkageforuistoqueryaofforwhicharelationεwhereεlinksalltheentitiesinUthatrefertothesameobjectasudoes,arecoreferentwithHowtocombine?Oursolution: Query-drivenentity UseSearch/browsing–asystemknows“whattolink”onlyatqueryyzesmallportionsofaverylargedatasettoansweron-demandOurAutomaticallyinfersemanticallyentitiesbasedonOWL/SKOS
Output:aof
an
1Builda(Initializetraining
LabeledSomepropertiestousetogether
External
LearnUnresolved
Assumptions:(1)coreferententitiessharesimilarproperty-valuepairs;(2)afewproperty-valuepairsaremoreimportantforlinkingentitiesRunning
“Nanjing“32N“118E“Nanjing“Nan-ching”“Nanjing”“32N“118E“117W“32NSome Discriminabilityofaproperty Property Non-coreferententity intermsofcoreferent Discriminabilityofavalue Discriminabilityofaprop-value>100>100RDF>2Same-asIFPFP2 BillionTriplesChallenge(BTC) Testing Top-50in364thousandquery8Music/54323 Evaluationprocedureand 30graduates,2judges+1arbitrator/link,Fleiss’sκ=0.8(sufficient Precision&relativerecall RR=correctlinksinonesystem/totalcorrectuniquelinksinall umiteration= Discriminabilitythreshold= Linkage Runningtimeon5,000samples:avg.11.3linksin OntologyAlignmentEvaluation ISWCworkshopsincen Ontologymatching&instancematching提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicationtodata Metadataisvitaltomultimediacontent Search,browsing,management Large-scaleLODarepublishedand Makeuseofsuchrichsourceof Existingmultimediametadatamodelsanddonotprovideformaltypicallyfocusonasinglemedia EXIF patiblewithMPEG- Differentmediatypesco-existinamultimedia Amoviemayhaveathememusicanda Aunified,well-definedontology(withits stoothers)neededtogain Challenge:LinkandintegrateheterogeneousAmotivingBeautyandthe Low-levelmetadata:runtime,location LOD:LinkedMDB,DBpedia Differentontologies(terms),different linkedmdb:filmlinkedmdb:directorlinkedmdb:11264 "BeautyandtheBeast"
andentitylinkageBeautyBeautyandtheruntime"91min." location
"BeautyandtheBeast" "...isa1991Americananimated
"BeautyandtheBeast"
Our CAMO:enrichmultimediametadataviaintegratingSelectDBpediaasthemediationandmatchwithLinkDBpediaentitieswithother andaggregatetheirIncorporatelegacyrelationaldatabases Moreover,provideamobileappforbrowsingandmultimediacontentonAndroid AssesstheadvantagesofintegratingLODintomultimediaSystemClient-ServerServer TheDBpedia3.6ontologyas Global-as-Viewsolutionof Music:DBpedia,DBTune, Movie:DBpedia,LinkedMDB,
Client Android-basedmobile Integratewithamultimedia Search&browsemultimediaSystemSearch,browseand
InstancemobileInstanceJohnrelationalEntityOntologyDataMatchingontologieswith DifferentLODsourceshavedifferentpreferenceson DBpedia,Musicontology Falcon-AO:anautomaticontologymatching Extend knowledgetosupportsynonym trackvs. StructuralStructural
4 Linguisticmatching:V-Doc(TF-IDF)&I-Sub(edit Structuralmatching:GMO(similarityLinkingentitieswith EntitylinkagehelpsmergealldescriptionsindifferentsourcesthattothesamemultimediaTrainingTraining
2
{p1,{p1,p3}ęc1vs.{p5,p6}ęc3{p1,p2}ęc1vs.{p3,p4}ęc2InstancelinkageĂ
Trainingset Negativeexamples:donotholdequivalencerelation Class-baseddiscriminativeproperty Information OnlineIntegratinglegacyrelational Therearestillagreatdealoflegacydatastoredin SomedatainLODaregeneratedfromtheirrelational123123 Element e.g.,entitytableandrelationship Element Instance
similartoontologymatchingandentitylinkage TwoUsabilityandeffectivenessofthemobileIntegrationaccuracyinthe User(1)Usability& 3comparative : : :WikipediaAndroid 6testing 50 10 22 18Usability& SystemUsabilityScale(SUS)&post-task
Post-task yzetheresultaccordingtothetypologyoftheIntegrationOntology 78 incl.18RDB
Entity 60thousand 100samplesper10110Lessons CAMOleveragesontologymatchingandentitylinkagefordataintegrationandsupportsuserstobrowseandsearchmultimediacontentonmobiledevices LessonsOntologymatters:trade-offbetweenexpressivenessandeaseofDataintegrationquality:humancomputation+machineMobileappdesign:conciseness,rankingscheme,user- FutureGeneratecomplex sforsemanticqueryExtendtouser-generatedNLP提 IntroductiontoSemanticWebandknowledge PartI:ontology PartII:entity PartIII:anapplicati
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 特发性肺间质纤维化病因介绍
- 烧伤感染病因介绍
- (高考真题)2022年湖南省普通高中学业水平选择性考试化学试题(原卷版)
- (麦当劳餐饮运营管理资料)M007-食材、调料每克成本总览表
- 2024版节能环保型亮化灯具推广与销售合同3篇
- 社会停车场施工组织设计
- 开题报告:有组织科研理念下职业本科院校教师科研能力评价标准与提升路径研究
- 开题报告:学前教育专业实践教学情境判断测验编制和应用研究
- 开题报告:新时代我国博士生学术创新能力的内涵、影响因素及提升路径研究
- 2024年专用无缝钢管购销协议版A版
- 年产1万吨连续玄武岩纤维及其制品申请建设可行性研究报告
- TB 10003-2016 铁路隧道设计规范 含2024年4月局部修订
- (正式版)SHT 3046-2024 石油化工立式圆筒形钢制焊接储罐设计规范
- 婴幼儿智能发育测试
- 小学冬季防病知识讲座
- 银行网络金融部培训课件
- 护理临床带教有效沟通
- 急腹症的急救课件
- GA/T 2015-2023芬太尼类药物专用智能柜通用技术规范
- 【甘蔗自动剥皮切断机的设计10000字(论文)】
- 湖南生物机电职业技术学院单招职业技能测试参考试题库(含答案)
评论
0/150
提交评论