分布式数据库系统_第1页
分布式数据库系统_第2页
分布式数据库系统_第3页
分布式数据库系统_第4页
分布式数据库系统_第5页
已阅读5页,还剩26页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、OutlineIntroductionBackgroundDistributed Database DesignDatabase IntegrationSchema MatchingSchema MappingSemantic Data ControlDistributed Query ProcessingMultimedia Query ProcessingDistributed Transaction ManagementData ReplicationParallel Database SystemsDistributed Object DBMSPeer-to-Peer Data Man

2、agementWeb Data Management Current IssuesProblem DefinitionGiven existing databases with their Local Conceptual Schemas (LCSs), how to integrate the LCSs into a Global Conceptual Schema (GCS)GCS is also called mediated schemaBottom-up design processIntegration AlternativesPhysical integrationSource

3、databases integrated and the integrated database is materializedData warehousesLogical integrationGlobal conceptual schema is virtual and not materializedEnterprise Information Integration (EII)Data Warehouse ApproachBottom-up DesignGCS (also called mediated schema) is defined firstMap LCSs to this

4、schemaAs in data warehousesGCS is defined as an integration of parts of LCSsGenerate GCS and map LCSs to this GCSGCS/LCS RelationshipLocal-as-viewThe GCS definition is assumed to exist, and each LCS is treated as a view definition over itGlobal-as-viewThe GCS is defined as a set of views over the LC

5、SsDatabase Integration ProcessRecall Access ArchitectureDatabase Integration IssuesSchema translationComponent database schemas translated to a common intermediate canonical representationSchema generationIntermediate schemas are used to create a global conceptual schemaSchema TranslationWhat is the

6、 canonical data model?RelationalEntity-relationshipDIKEObject-orientedARTEMISGraph-orientedDIPE, TranScm, COMA, CupidPreferable with emergence of XMLNo common graph formalismMapping algorithmsThese are well-knownSchema GenerationSchema matchingFinding the correspondences between multiple schemasSche

7、ma integrationCreation of the GCS (or mediated schema) using the correspondencesSchema mappingHow to map data from local databases to the GCSImportant: sometimes the GCS is defined first and schema matching and schema mapping is done against this target GCSRunning ExampleEMP(ENO, ENAME, TITLE)PROJ(P

8、NO, PNAME, BUDGET, LOC, CNAME)ASG(ENO, PNO, RESP, DUR)PAY(TITLE, SAL)RelationalE-R ModelSchema MatchingSchema heterogeneityStructural heterogeneityType conflictsDependency conflictsKey conflictsBehavioral conflictsSemantic heterogeneityMore important and harder to deal withSynonyms, homonyms, hypern

9、ymsDifferent ontologyImprecise wordingSchema Matching (contd)Other complicationsInsufficient schema and instance informationUnavailability of schema documentationSubjectivity of matchingIssues that affect schema matchingSchema versus instance matchingElement versus structure level matchingMatching c

10、ardinalitySchema Matching ApproachesLinguistic Schema MatchingUse element names and other textual information (textual descriptions, annotations) May use external sources (e.g., Thesauri)SC1.element-1 SC2.element-2, p,sElement-1 in schema SC1 is similar to element-2 in schema SC2 if predicate p hold

11、s with a similarity value of sSchema levelDeal with names of schema elementsHandle cases such as synonyms, homonyms, hypernyms, data type similaritiesInstance levelFocus on information retrieval techniques (e.g., word frequencies, key terms)“Deduce” similarities from theseLinguistic MatchersUse a se

12、t of linguistic (terminological) rulesBasic rules can be hand-crafted or may be discovered from outside sources (e.g., WordNet)Predicate p and similarity value s hand-crafted specified, discovered may be computed or specified by an expert after discoveryExamplesuppercase names lower case names, true

13、, 1.0uppercase names capitalized names, true, 1.0capitalized names lower case names, true, 1.0DB1.ASG DB2.WORKS_IN, true, 0.8Automatic Discovery of Name SimilaritiesAffixesCommon prefixes and suffixes between two element name stringsN-gramsComparing how many substrings of length n are common between

14、 the two name stringsEdit distanceNumber of character modifications (additions, deletions, insertions) that needs to be performed to convert one string into the otherSoundex codePhonetic similarity between names based on their soundex codesAlso look at data typesData type similarity may suggest stro

15、nger relationship than the computed similarity using these methods or to differentiate between multiple strings with same valueN-gram Example3-grams of string “Responsibility” are the following:Res sibibi espbip spoili ponlitonsitynsi3-grams of string “Resp” areResesp3-gram similarity: 2/12 = 0.17Ed

16、it Distance ExampleAgain consider “Responsibility” and “Resp”To convert “Responsibility” to “Resp”Delete characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y”To convert “Resp” to “Responsibility”Add characters “o”, “n”, “s”, “i”, “b”, “i”, “l”, “i”, “t”, “y”The number of edit operations requir

17、ed is 10Similarity is 1 (10/14) = 0.29Constraint-based MatchersData always have constraints use themData type informationValue rangesExamplesRESP and RESPONSIBILITY: n-gram similarity = 0.17, edit distance similarity = 0.19 (low)If they come from the same domain, this may increase their similarity v

18、alueENO in relational, WORKER.NUMBER and PROJECT.NUMBER in E-RENO and WORKER.NUMBER may have type INTEGER while PROJECT.NUMBER may have STRINGConstraint-based Structural MatchingIf two schema elements are structurally similar, then there is a higher likelihood that they represent the same conceptStr

19、uctural similarity:Same properties (attributes)“Neighborhood” similarityUsing graph representationThe set of nodes that can be reached within a particular path length from a node are the neighbors of that nodeIf two concepts (nodes) have similar set of neighbors, they are likely to represent the sam

20、e conceptLearning-based Schema MatchingUse machine learning techniques to determine schema matchesClassification problem: classify concepts from various schemas into classes according to their similarity. Those that fall into the same class represent similar conceptsSimilarity is defined according t

21、o features of data instancesClassification is “learned” from a training setLearning-based Schema MatchingCombined Schema Matching ApproachesUse multiple matchersEach matcher focuses on one area (name, etc)Meta-matcher integrates these into one predictionIntegration may be simple (take average of sim

22、ilarity values) or more complex (see Fagins work)Schema IntegrationUse the correspondences to create a GCSMainly a manual process, although rules can helpBinary Integration MethodsN-ary Integration MethodsSchema MappingMapping data from each local database (source) to GCS (target) while preserving semantic consistency as defined in both source and target.Data warehouses actual translationData integration systems discover mappings that ca

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论