版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、OutlineIntroductionBackgroundDistributed Database DesignDatabase IntegrationSchema MatchingSchema MappingSemantic Data ControlDistributed Query ProcessingMultimedia Query ProcessingDistributed Transaction ManagementData ReplicationParallel Database SystemsDistributed Object DBMSPeer-to-Peer Data Man
2、agementWeb Data Management Current IssuesProblem DefinitionGiven existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS)GCS is also called mediated schemaBottom-up design processIntegration AlternativesPhysical integrationSource
3、databases integrated and the integrated database is materializedData warehousesLogical integrationGlobal conceptual schema is virtual and not materializedEnterprise Information Integration (EII)Data Warehouse ApproachBottom-up DesignGCS (also called mediated schema) is defined firstMap LCSs to this
4、schemaAs in data warehousesGCS is defined as an integration of parts of LCSsGenerate GCS and map LCSs to this GCSGCS/LCS RelationshipLocal-as-viewThe GCS definition is assumed to exist, and each LCS is treated as a view definition over itGlobal-as-viewThe GCS is defined as a set of views over the LC
5、SsDatabase Integration ProcessRecall Access ArchitectureDatabase Integration IssuesSchema translationComponent database schemas translated to a common intermediate canonical representationSchema generationIntermediate schemas are used to create a global conceptual schemaSchema TranslationWhat is the
6、 canonical data model?RelationalEntity-relationshipDIKEObject-orientedARTEMISGraph-orientedDIPE, TranScm, COMA, CupidPreferable with emergence of XMLNo common graph formalismMapping algorithmsThese are well-knownSchema GenerationSchema matchingFinding the correspondences between multiple schemasSche
7、ma integrationCreation of the GCS (or mediated schema) using the correspondencesSchema mappingHow to map data from local databases to the GCSImportant: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCSRunning ExampleEMP(ENO, ENAME, TITLE)PROJ(P
8、NO, PNAME, BUDGET, LOC, CNAME)ASG(ENO, PNO, RESP, DUR)PAY(TITLE, SAL)RelationalE-R ModelSchema MatchingSchema heterogeneityStructural heterogeneityType conflictsDependency conflictsKey conflictsBehavioral conflictsSemantic heterogeneityMore important and harder to deal withSynonyms, homonyms, hypern
9、ymsDifferent ontologyImprecise wordingSchema Matching (contd)Other complicationsInsufficient schema and instance informationUnavailability of schema documentationSubjectivity of matchingIssues that affect schema matchingSchema versus instance matchingElement versus structure level matchingMatching c
10、ardinalitySchema Matching ApproachesLinguistic Schema MatchingUse element names and other textual information (textual descriptions, annotations) May use external sources (e.g., Thesauri)SC1.element-1 SC2.element-2, p,sElement-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p hold
11、s with a similarity value of sSchema levelDeal with names of schema elementsHandle cases such as synonyms, homonyms, hypernyms, data type similaritiesInstance levelFocus on information retrieval techniques (e.g., word frequencies, key terms)“Deduce” similarities from theseLinguistic MatchersUse a se
12、t of linguistic (terminological) rulesBasic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet)Predicate p and similarity value s hand-crafted specified, discovered may be computed or specified by an expert after discoveryExamplesuppercase names lower case names, true
13、, 1.0uppercase names capitalized names, true, 1.0capitalized names lower case names, true, 1.0DB1.ASG DB2.WORKS_IN, true, 0.8Automatic Discovery of Name SimilaritiesAffixesCommon prefixes and suffixes between two element name stringsN-gramsComparing how many substrings of length n are common between
14、 the two name stringsEdit distanceNumber of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the otherSoundex codePhonetic similarity between names based on their soundex codesAlso look at data typesData type similarity may suggest stro
15、nger relationship than the computed similarity using these methods or to differentiate between multiple strings with same valueN-gram Example3-grams of string “Responsibility” are the following:Res sibibi espbip spoili ponlitonsitynsi3-grams of string “Resp” areResesp3-gram similarity: 2/12 = 0.17Ed
16、it Distance ExampleAgain consider “Responsibility” and “Resp”To convert “Responsibility” to “Resp”Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y”To convert “Resp” to “Responsibility”Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y”The number of edit operations requir
17、ed is 10Similarity is 1 (10/14) = 0.29Constraint-based MatchersData always have constraints use themData type informationValue rangesExamplesRESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low)If they come from the same domain, this may increase their similarity v
18、alueENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-RENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRINGConstraint-based Structural MatchingIf two schema elements are structurally similar, then there is a higher likelihood that they represent the same conceptStr
19、uctural similarity:Same properties (attributes)“Neighborhood” similarityUsing graph representationThe set of nodes that can be reached within a particular path length from a node are the neighbors of that nodeIf two concepts (nodes) have similar set of neighbors, they are likely to represent the sam
20、e conceptLearning-based Schema MatchingUse machine learning techniques to determine schema matchesClassification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar conceptsSimilarity is defined according t
21、o features of data instancesClassification is “learned” from a training setLearning-based Schema MatchingCombined Schema Matching ApproachesUse multiple matchersEach matcher focuses on one area (name, etc)Meta-matcher integrates these into one predictionIntegration may be simple (take average of sim
22、ilarity values) or more complex (see Fagins work)Schema IntegrationUse the correspondences to create a GCSMainly a manual process, although rules can helpBinary Integration MethodsN-ary Integration MethodsSchema MappingMapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target.Data warehouses actual translationData integration systems discover mappings that ca
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 书店企划述职报告范文
- 护士大学生职业规划
- 2025年南京c1货运从业资格证考试题下载
- 2025年南阳货运员初级考试题库
- 《银行授信方案》课件
- 校园安全人人有责班会
- 2025年台州运输从业资格证考试技巧
- 应聘销售业务经理管理
- 加工中心的编程教学课件
- 2025香港公司股份转让合同书
- 法务工作月度汇报
- 期末测评-2024-2025学年统编版语文三年级上册
- 安全攻防实验室建设
- 消防泵操作规程
- 怎样保护我们的眼睛(教学设计)-2024-2025学年五年级上册综合实践活动教科版
- 合肥新华书店招聘笔试题库2024
- 2024-2030年全球与中国石墨匣钵市场现状动态及投资价值评估报告
- 肾穿刺活检术的准备和操作方法教学提纲
- 20以内的加法口算练习题4000题 210
- 读书分享课件:《一句顶一万句》
- GB 17927-2024家具阻燃性能安全技术规范
评论
0/150
提交评论