版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、OutlineIntroductionBackgroundDistributed Database DesignDatabase IntegrationSchema MatchingSchema MappingSemantic Data ControlDistributed Query ProcessingMultimedia Query ProcessingDistributed Transaction ManagementData ReplicationParallel Database SystemsDistributed Object DBMSPeer-to-Peer Data Man
2、agementWeb Data Management Current IssuesProblem DefinitionGiven existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS)GCS is also called mediated schemaBottom-up design processIntegration AlternativesPhysical integrationSource
3、databases integrated and the integrated database is materializedData warehousesLogical integrationGlobal conceptual schema is virtual and not materializedEnterprise Information Integration (EII)Data Warehouse ApproachBottom-up DesignGCS (also called mediated schema) is defined firstMap LCSs to this
4、schemaAs in data warehousesGCS is defined as an integration of parts of LCSsGenerate GCS and map LCSs to this GCSGCS/LCS RelationshipLocal-as-viewThe GCS definition is assumed to exist, and each LCS is treated as a view definition over itGlobal-as-viewThe GCS is defined as a set of views over the LC
5、SsDatabase Integration ProcessRecall Access ArchitectureDatabase Integration IssuesSchema translationComponent database schemas translated to a common intermediate canonical representationSchema generationIntermediate schemas are used to create a global conceptual schemaSchema TranslationWhat is the
6、 canonical data model?RelationalEntity-relationshipDIKEObject-orientedARTEMISGraph-orientedDIPE, TranScm, COMA, CupidPreferable with emergence of XMLNo common graph formalismMapping algorithmsThese are well-knownSchema GenerationSchema matchingFinding the correspondences between multiple schemasSche
7、ma integrationCreation of the GCS (or mediated schema) using the correspondencesSchema mappingHow to map data from local databases to the GCSImportant: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCSRunning ExampleEMP(ENO, ENAME, TITLE)PROJ(P
8、NO, PNAME, BUDGET, LOC, CNAME)ASG(ENO, PNO, RESP, DUR)PAY(TITLE, SAL)RelationalE-R ModelSchema MatchingSchema heterogeneityStructural heterogeneityType conflictsDependency conflictsKey conflictsBehavioral conflictsSemantic heterogeneityMore important and harder to deal withSynonyms, homonyms, hypern
9、ymsDifferent ontologyImprecise wordingSchema Matching (contd)Other complicationsInsufficient schema and instance informationUnavailability of schema documentationSubjectivity of matchingIssues that affect schema matchingSchema versus instance matchingElement versus structure level matchingMatching c
10、ardinalitySchema Matching ApproachesLinguistic Schema MatchingUse element names and other textual information (textual descriptions, annotations) May use external sources (e.g., Thesauri)SC1.element-1 SC2.element-2, p,sElement-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p hold
11、s with a similarity value of sSchema levelDeal with names of schema elementsHandle cases such as synonyms, homonyms, hypernyms, data type similaritiesInstance levelFocus on information retrieval techniques (e.g., word frequencies, key terms)“Deduce” similarities from theseLinguistic MatchersUse a se
12、t of linguistic (terminological) rulesBasic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet)Predicate p and similarity value s hand-crafted specified, discovered may be computed or specified by an expert after discoveryExamplesuppercase names lower case names, true
13、, 1.0uppercase names capitalized names, true, 1.0capitalized names lower case names, true, 1.0DB1.ASG DB2.WORKS_IN, true, 0.8Automatic Discovery of Name SimilaritiesAffixesCommon prefixes and suffixes between two element name stringsN-gramsComparing how many substrings of length n are common between
14、 the two name stringsEdit distanceNumber of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the otherSoundex codePhonetic similarity between names based on their soundex codesAlso look at data typesData type similarity may suggest stro
15、nger relationship than the computed similarity using these methods or to differentiate between multiple strings with same valueN-gram Example3-grams of string “Responsibility” are the following:Res sibibi espbip spoili ponlitonsitynsi3-grams of string “Resp” areResesp3-gram similarity: 2/12 = 0.17Ed
16、it Distance ExampleAgain consider “Responsibility” and “Resp”To convert “Responsibility” to “Resp”Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y”To convert “Resp” to “Responsibility”Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y”The number of edit operations requir
17、ed is 10Similarity is 1 (10/14) = 0.29Constraint-based MatchersData always have constraints use themData type informationValue rangesExamplesRESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low)If they come from the same domain, this may increase their similarity v
18、alueENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-RENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRINGConstraint-based Structural MatchingIf two schema elements are structurally similar, then there is a higher likelihood that they represent the same conceptStr
19、uctural similarity:Same properties (attributes)“Neighborhood” similarityUsing graph representationThe set of nodes that can be reached within a particular path length from a node are the neighbors of that nodeIf two concepts (nodes) have similar set of neighbors, they are likely to represent the sam
20、e conceptLearning-based Schema MatchingUse machine learning techniques to determine schema matchesClassification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar conceptsSimilarity is defined according t
21、o features of data instancesClassification is “learned” from a training setLearning-based Schema MatchingCombined Schema Matching ApproachesUse multiple matchersEach matcher focuses on one area (name, etc)Meta-matcher integrates these into one predictionIntegration may be simple (take average of sim
22、ilarity values) or more complex (see Fagins work)Schema IntegrationUse the correspondences to create a GCSMainly a manual process, although rules can helpBinary Integration MethodsN-ary Integration MethodsSchema MappingMapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target.Data warehouses actual translationData integration systems discover mappings that ca
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 浙江银行招聘-招商银行宁波分行2026年社会招聘考试备考题库及答案解析
- 2026吉林高速公路集团有限公司白城分公司劳务派遣项目招聘2人考试参考试题及答案解析
- 2026湖南常德市自来水有限责任公司遴选9人笔试模拟试题及答案解析
- 2026年保山市昌宁县机关事务管理局招聘编外工作人员(1人)考试参考试题及答案解析
- 2026上半年北京门头沟区卫生健康系统事业单位招聘卫生专业技术人员考试备考试题及答案解析
- 2026广东肇庆市怀集县诗洞镇人民政府招聘镇派驻村(社区)党群服务中心工作人员4人考试参考试题及答案解析
- 2026内蒙古乌海市狮城资管运营管理有限责任公司招聘财务人员1人笔试模拟试题及答案解析
- 2026浙江台州椒江区山海幼儿园海尚望府园招聘劳务派遣工作人员1人考试参考试题及答案解析
- 2026年绥化市城市管理综合执法局所属事业单位城市运行服务中心公开选调工作人员8人考试备考题库及答案解析
- 2026重庆巴岳保安服务有限公司招聘1人考试备考题库及答案解析
- DZ∕T 0248-2014 岩石地球化学测量技术规程(正式版)
- JTJ-T-257-1996塑料排水板质量检验标准-PDF解密
- 残疾人法律维权知识讲座
- 沥青维护工程投标方案技术标
- 水电站建筑物课程设计
- 儿童行为量表(CBCL)(可打印)
- 硒功能与作用-课件
- 《英语教师职业技能训练简明教程》全册配套优质教学课件
- DB53∕T 1034-2021 公路隧道隐蔽工程无损检测技术规程
- 同步工程的内涵、导入和效果
- DB32∕T 2349-2013 杨树一元立木材积表
评论
0/150
提交评论