分布式数据库设计(共82张PPT)_第1页
分布式数据库设计(共82张PPT)_第2页
分布式数据库设计(共82张PPT)_第3页
分布式数据库设计(共82张PPT)_第4页
分布式数据库设计(共82张PPT)_第5页
已阅读5页,还剩77页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

分布式数据库设计A

FRAMEWORK

FOR

DISTRIBUTEDDATABASE

DESIGN

(概述)THE

DESIGN

OF

DATABASEFRAGMENTATION

(分片设计)THE

ALLOCATION

OF

FRAGMENTS

(分配设计)University

of

Shanghai

for

Science

and

TechnologyDistributed

DBMSPage

2.1Level

of

sharing

共享维

不共享,数据共享,数据+程序共享Access

pattern

behavior访问模式维

静态模式,动态

模式(分布式数据库设计与查询处理)Level

of

knowledge

访问模式知识维用户完全已知或部

分已知访问模式分布式系统设计的三维Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.2存取模式动态静态部分信息数据知识级别数据+程序完整信息共享Dimensions

of

the

ProblemUniversity

of

Shanghai

for

Science

and

TechnologyDistributed

DBMSPage

2.3集中式数据库设计1.

Designing

the

"conceptual

schema"which

describes

the

integrated

database2.

Designing

the

"physical

database,"i.e.,

mapping

the

conceptual

schema

to

storage

areas

and

determining

appropriate

accessmethods.University

of

Shanghai

for

Science

and

TechnologyDistributed

DBMSPage

2.4分布式数据库设计的特殊要求+3.Designing

the

fragmentation.+4.

Designing

the

allocation

of

fragments,i.e.mapped

to

physical

images;

also

the

replication

of

fragments

is

determined.University

of

Shanghai

for

Science

and

TechnologyDistributed

DBMSPage

2.5关于分片和分配的几点注意Fragmentation

design

been

partially

analyzed

in

centralized

systemswith

multiple

storage

devices.The

allocation

problem

has

been

studied

as

the

"file

allocation

problem."The

distinction

between

two

problems

is

conceptually

relevantone

deals

with

the

"logical

criteria"

which

motivate

thefragmentation

of

a

global

relationone

deals

with

the

"physical"

placement

of

data

at

the

various

sites.这两个问题通常是相互关联的,不可能独立地解决它们而

能确定最优的fragmentaion和allocationUniversity

of

Shanghai

for

Science

and

TechnologyDistributed

DBMSPage

2.6关于APPLICATION考虑因素:分布式数据库设计包括:分布式数据库设计和相应的分布式应用设计1.

The

site

from

which

the

application

is

issued

(site

of

origin

of

theapplication).2.

The

frequency

of

activation

of

the

application

(i.e.,在单位时间内被激活

的次数);

applications

which

can

be

issued

at

multiple

sites,

we

need

to

know

the

frequency

ofactivation

of

each

application

at

each

site.3.

The

number,

type,

and

the

statistical

distribution

of

accesses

made

byeach

application

to

each

required

data

"object."University

of

Shanghai

for

Science

and

TechnologyDistributed

DBMSPage

2.7设计目标

(Objectives)Processing

locality数据处理的本地性Availability

and

reliability

of

distributed

data

布式数据的有效性和可靠性冗余控制Workload

distribution

工作负荷的合理分布Storage

costs

and

availability存储能力和费用Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.8Processing

localityMaximizeprocessing

locality

corresponds

to

the

simple

principle

ofplacing

data

as

close

as

possible

to

the

applications

which

usethem.Maximizing

processing

locality

(minimizing

remote

references)

canbe

done

by

adding

the

number

of

local

and

remote

referencescorresponding

to

each

candidate

fragmentation

and

fragment,

allocation,

and

selecting

the

best

solution

among

them.The

advantage

of

complete

locality

is

not

only

the

reduction

of

remoteaccesses,

but

also

the

increased

simplicity

in

controlling

the

executionof

theapplication.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.9Availability

and

reliability

of

distributed

dataA

high

degree

of

availability

for

read-only

applications

isachieved

by

storing

multiple

copies

of

the

same

information;the

system

must

be

able

to

switch

to

an

alternative

copy

when

the

one

that

should

be

accessed

under

normal

conditions

is

not

available.Reliability

is

also

achieved

by

storing

multiple

copies

of

the

same

information

-

possible

to

recover

from

crashes

orfrom

the

physical

destruction

of

one

of

the

copies

by

using

the

other

still

available

copies.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.10Workload

distributionAn

important

feature

of

distributed

computersystems.To

take

advantage

of

the

different

powers

or

utilizations

of

computers

at

each

site,Maximize

the

degree

of

parallelism

of

execution

of

applications.workload

distribution

might

negatively

affectprocessing

locality

-

to

consider

the

trade-offDistributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.11Storage

costs

and

availabilityShould

reflect

the

cost

and

availability

of

storage

at

the

different

sites.

It

is

possible

to

have

specialized

sites

in

the

network

for

data

storage,

orconversely

to

have

sites

which

do

not

support

mass

storage

at

all.通常存储的费用并不是非常重要(Compared

to

CPU,I/O,

Transmission

of

network).Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.12设计方法Top-Down

Approach自顶向下Bottom-Up

Approach自底向上Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.13Top-down

approach已

有DB..如何分割数据及如何分配这些数据到不同站点过程start

by

designing

the

global

schemadesigning

the

fragmentation

of

the

databasethen

by

allocating

the

fragments

to

the

sites,

creating

the

physicalimagesThe

approach

is

completed

by

performing,

at

each

site,

the

"physical

design"

of

the

data

which

are

allocated

to

it.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.14AccessInformationDistributionDesignObjectivesUser

InputView

IntegrationTop-Down

DesignRequirements

AnalysisConceptual

DesignPhysical

DesignView

DesignUser

InputLCS'sLIS'sGCSES's特点能先看到雏形问题:

When

the

distributed

database

is

developed

as

the

aggregation

ofexisting

databases,

it

is

not

easy

to

follow

the

top-down

approach.

The

global

schema

is

often

produced

as

a

compromise

between

existing

data

descriptions.Distributed

DBMSUniversity

of

Shanghai

for

Science

and

Technology

Page

2.16Bottom-up

approachExisting

databases

are

aggregated(还可能是异构heterogeneous

或完全自治autonomous),

无设计问题(信息集成)!Based

on

the

integration

of

existing

schemata

into

a

single,

global

schema.By

integration,

the

merging

of

common

datadefinitions

and

the

resolution

of

conflicts

among

different

representations

given

to

the

same

data.University

of

Shanghai

for

Science

and

TechnologyDistributed

DBMSPage

2.17bottom-up

approachHorizontal

fragments

of

a

same

global

relation

must

have

thesame

relation

schema

-

easily

enforced

in

a

top-downdesign,

while

it

is

difficult

to

"discover"

it.

The

integrationprocess

should

attempt

to

modify

the

definitions

of

localrelations,

so

that

they

can

be

regarded

as

horizontal

fragments

of

a

common,

global

relation.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.18The

selection

of

a

common

database

modelfor

describing

the

global

schema

of

thedatabase.The

translation

of

each

local

schema

into

thecommon

data

model.The

integration

of

the

local

schemata

into

acommon

global

schema.bottom-up

design

requires(异构情况下)Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.19DDB

设计的两个问题FragmentaionHorizontal

FragmentationVertical

fragmentationAllocation通常分片设计和分配设计需要统筹考

虑Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.20Horizontal

FragmentationPrimary

fragmentation

初级分片Derived

horizontal

fragmentation导出

分片Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.21水平分片原则若

R⇔F={F₁

,F₂

,…,Fn},则完整性

对于每一个元组t

∈R,3F₁

∈F

使得t

∈F不相交性

对Vt

∈F,-3Fj

使得

t

∈Fj,

i≠j可重构性操作是并し(可以忽略,因为完整性就蕴含着)R=U{F₁

,F₂

,..,

Fn}Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.22水平分片一例例子EMP(E#,

NAME,

DEPT,

JOB,

SAL,TEL,..)DEPT={1,2}

JOB={‘P',

‘-P’}假定,应用经常查询的内容是属于部门1且是程序员的职员。

(80/20原则)则可能有的水平分片限定

(Qualification)P={

DEPT=1}P={DEPT=1,JOB=‘P’}P={DEPT=1,JOB=‘P’,SAL>500}Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.23"手工"检查!e.g.,

F1=Oloc='Sa’E;F₂

=Oloc=‘Sb'E生成具有满足分段原则的predicate谓词Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.24如何保证分片原则谓词:用来执行分片选择操作的条件1.A

simple

predicate

简单谓词:Attribute

=

value

eg.:

DEPT=12.A

minterm

predicate

(

小项谓词)

y:

给定简单谓词集P={p₁,p₂

…pn},y=△piepP₁*

也既是P₁*AP₂*A..APn*where(p*=p;or

p₁*=NOT

p;)and

y≠false3.A

fragment

is

the

set

of

all

tuples

for

which

a

minterm

predicate

holds.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.25一些定义谓词生成过程找到常用的AP查询的simple

predicate

(A,θ

Value)

诸如:A<10,A>5,Loc

=Sa,Loc

=Sb生

"

"

词消除可能出现的无用谓词Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.26Global

relation

EMP(EMPNUM,

NAME,

SAL,

TAX,

MGRNUM,

DEPTNUM)Assume:

some

important

APs

require

information

about

employeeswho

are

members

of

department;

other

important

APs

which

require

onlythe

data

of

employees

who

areprogrammers;

these

last

APs

can

be

issued

at

any

site,

and

reference

all

programmers

with

the

same

probability.Assume:

that

there

are

only

two

departments,

1

and

2;

thus,DEPT=1→DEPT≠

2,and

vice

versa.Two

simple

predicates

are

DEPT=1

and

JOB="P"(programmer).

The

minterm

predicates

for

these

two

predicates

areDEPT=1

AND

JOB="P"DEPT=1

AND

JOB≠"P"DEPT≠1

AND

JOB="P"DEPT≠1

AND

JOB

≠"P"Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.27ExampleAll

the

above

simple

predicates

are

relevante.g.

SAL>50

is

not

a

relevant

predicate;Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.28讨论LetP={pi,p₂,...,pn}

be

a

set

ofsimple

predicates.为了正确有效进行分片,则P必须是completeand

minimal1.P

of

predicates

is

complete

if

any

two

tuplesbelonging

to

the

same

fragment

are

referencedwith

the

same

probability

by

any

application.2.P

is

minimal

if

all

its

predicates

are

relevant.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.29complete

and

minimalP₁={DEPT

=1}is

not

complete

-the

applicationsreference

tuples

of

programmers

with

a

greaterprobability

within

each

fragment

produced

by

P₁

.P₂={DEPT=1,JOB="P"}is

complete

and

minimal.P₃={DEPT=1,JOB="P",SAL>50}iscomplete

butnot

minimal,

since

SAL>50

is

not

relevant.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.30ExampleFragmentation

MethodBasis

Consider

a

predicate

p,

which

partitions

the

tuples

of

R

into

two

parts

which

are

referenced

differently

by

at

leastone

application.

Let

P

={p₁}MethodConsider

a

new

simple

predicate

p;

which

partitions

at

least

onefragment

of

P

into

two

parts

which

are

referenced

in

a

different

way

by

at

least

one

application.Set

P—Pup;.

Eliminate

nonrelevant

predicates

from

P.Repeat

this

step

until

the

set

of

the

minterm

fragments

of

P

iscomplete.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.31Consider:

SAL>50:

if

programmers

have

average

salary

greater

than

50,

itdetermines

two

sets

of

employees

who

are

referenced

differently

by

theapplications.P1={SAL>50}Consider:

DEPT

=1;

thispredicate

is

relevant

and

is

added

to

the

previous

one,

P2={

SAL>50,DEPT=1}.Consider:

JOB="P".

The

predicate

is

relevant,set

P3={SAL>50,DEPT=1,JOB="P"}.then

SAL>

50

is

not

relevant

in

P3,

thus,

the

final

set

P4={DEPT=1,JOB=

"P"},

which

is

complete

and

minimal.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.32ExampleA

"reasonable"

way1.Concentrating

on

a

few

importantapplications2.Not

distinguishing

fragments

whosefeatures

are

very

similarDistributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.331.Administrative

applications,

issued

only

at

sites

1

and

3;administrative

applications

about

departments

in

the

northernarea

are

issued

at

site

1;

those

about

departments

in

the

southern

area

are

issued

at

site

3.2.Applications

about

work

conducted

at

each

department;

theycan

be

issued

at

any

department,

but

they

reference

tuples

of

the

departments

which

are

closer

to

their

site

of

origin

with

higher

probability

than

the

tuples

of

other

departments.DEPT

(DEPTNUM,NAME,AREA,MGRNUM)

importantapplications:Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.34P1:DEPTNUM<10P2:10<DEPTNUM<20P3:DEPTNUM

>20P4:AREA

="North"P5:AREA

="South"Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.35Set

of

predicates:Y1:DEPTNUM<10Y2:

DEPTNUM<10Y3:10<DEPTNUM<20Y4:10<DEPTNUM<20Y5:

DEPTNUM>10Y6:

DEPTNUM>10AREA="North"AREA="South“AREA="North"

AREA="South"AREA

="North"AREA="South"andandandandandandDistributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.36可能的谓词限定Reduce,

eg.AREA="North"implies

that

DEPNUM>20y1:DEPTNUM≤10y2:(10<DEPTNUM≤20)AND

(AREA

="North")y3:(10<DEPTNUM≤20)AND

(AREA

=

"South")y4:DEPTNUM>20Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.37Derived

Horizontal

Fragmentation导出分片DHF:

从另一个关系的属性性质或水平分片

推导出来采

用DHF

可以使分片之间的join操作更加容易Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.38DHF

分片exampleeg:SC(S#,C#,GRADE)S(S#,SNAME.

AGE,

SEX)分段设计Define

fragment

SC1

asSelect

SC.S#,C#,GRADE

From

SC,SWhere

SC.S#=S.S#

and

SEX=‘M'Define

fragment

SC2

asSelect

SC.S#,C#,GRADE

From

SC,SWhere

SC.S#=S.S#

and

SEX=‘F'Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.39distributed

joinjoin

graphsTotalSimplepartitioned分布式数据库中的join

连接操作Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.40圆圈:数据分片无向边:两个分片之间有相同

属性值的元组存在Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.41连接图定义R

SJoin

graphA

join

graph

is

total

when

it

contains

all

possible

edgesbetween

fragments

of

R

and

S;完全连接图定义Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.42R

STotal

Join

graphA

reduced

join

graph

is

partitioned

if

the

graph

is

composedof

two

or

moresubgraphs

without

edges

between

themDistributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.43部分连接图定义Partitioned

Join

graphR

SA

reduced

join

graph

R

Sis

simple

if

it

ispartitioned

and

eachsubgraph

has

justone

edgeDistributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.44简单连接图定义Simple

Join

graphSUPPLY

(SNUM,PNUM,

DEPTNUM,

QUAN)

SUPPLY

is

always

used

together

with

another

relationSome

applications

require

information

about

supplies

of

given

suppliers-join

SUPPLY

and

SUPPLIER

on

the

SNUM

attribute.

The

other

applications

require

information

about

supplies

at

a

given

department-join

SUPPLY

and

DEPT

on

the

DEPTNUMattribute.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.45General

example

(continued)DEPT

is

horizontally

fragmented

according

to

values

taken

by

the

attribute

DEPTNUMSUPPLIER

is

horizontally

fragmented

according

to

values

taken

by

the

attribute

SNUM.There

are

two

possible

derived

fragmentationsSUPPLYone

through

the

semi-join

with

SUPPLIER

on

SNUMone

through

the

semi-join

with

DEPT

on

DEPTNUMboth

of

them

are

correct.The

selection

between

these

alternatives

should

take

intoaccount

which

one

of

the

two

corresponding

joins

is

more

used

by

applications.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.46Vertical

FragmentationVertical

FragmentationVertical

Clusteringobjective:将某个AP

频繁使用的属性聚

集在一起,当有多个APs

有时候需要权

衡利弊。Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.47Vertical

Fragmentation为一全局关系R

进行分片是不容易的,因为随着R

属性数目增加,可能的分片数目也大幅度增加

(the

number

of

possible

clusters

is

even

larger.)两种启发式方法(heuristic

approaches)The

split

approach

in

which

global

relations

are

progressively

splitinto

fragments分裂法The

grouping

approach

in

which

attributes

are

progressively

aggregated

to

constitute

fragments成组法Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.48EMP(EMPNUM,NAME,

SAL,TAX,MGRNUM,DEPTNUM)APP1

、Administrative

applications,

concentrated

at

site

3,

requiring

NAME,

SAL,

and

TAX

of

employees.APP2

、Applications

about

work

conducted

at

each

department,requiring

NAME,

MGRNUM,

and

DEPTNUM

of

employees;

these

applications

are

issued

at

all

sites,

and

reference

tuples

of

employees

in

the

same

group

of

departments

with

80

percent

probability.Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.49General

example

(continued)EMPI(EMPNUM,NAME,TAX,

SAL)EMP2(EMPNUM,

NAME,

MGRNUM,DEPTNUM)Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.50结果the

simplest

ways:1.

Applying

horizontal

fragmentation

to

vertical

fragments2.Applying

vertical

fragmentation

to

horizontal

fragmentsDistributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.51Mixed

Fragmentationnonredundant

allocation(

easier)The

simplest

method

is

a

"best-fit"

(最佳适应

)

approach;

a

measure

is

associated

with

eachpossible

allocation,

and

the

site

with

the

best

measure

is

selected.redundant

allocationReplication

introduces

further

complexity,例如复

制程度,如何检索和更新等Distributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.52THE

ALLOCATION

OF

FRAGMENTS讨论在进行redundant

allocation冗余分配时,通常先求nonredundant

allocation非冗余分配的解,在此基础上再求

redundant

allocation冗余分配的解The

"additional

replication"

method

is

a

typical

heuristicapproach;

with

this

method,

it

is

possible

to

take

intoaccount

that

the

increase

in

the

degree

of

redundancy

isDistributed

DBMS

University

of

Shanghai

for

Science

and

Technology

Page

2.53progressively

less

beneficial.Two

methods

(for

reduntant

allocation):1.

Determine

the

set

of

all

sites

where

the

benefit

of

allocatingone

copy

of

the

fragment

is

higher

than

the

cost,

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论