版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
分布式数据库设计A
FRAMEWORK
FOR
DISTRIBUTEDDATABASE
DESIGN
(概述)THE
DESIGN
OF
DATABASEFRAGMENTATION
(分片设计)THE
ALLOCATION
OF
FRAGMENTS
(分配设计)University
of
Shanghai
for
Science
and
TechnologyDistributed
DBMSPage
2.1Level
of
sharing
共享维
不共享,数据共享,数据+程序共享Access
pattern
behavior访问模式维
静态模式,动态
模式(分布式数据库设计与查询处理)Level
of
knowledge
访问模式知识维用户完全已知或部
分已知访问模式分布式系统设计的三维Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.2存取模式动态静态部分信息数据知识级别数据+程序完整信息共享Dimensions
of
the
ProblemUniversity
of
Shanghai
for
Science
and
TechnologyDistributed
DBMSPage
2.3集中式数据库设计1.
Designing
the
"conceptual
schema"which
describes
the
integrated
database2.
Designing
the
"physical
database,"i.e.,
mapping
the
conceptual
schema
to
storage
areas
and
determining
appropriate
accessmethods.University
of
Shanghai
for
Science
and
TechnologyDistributed
DBMSPage
2.4分布式数据库设计的特殊要求+3.Designing
the
fragmentation.+4.
Designing
the
allocation
of
fragments,i.e.mapped
to
physical
images;
also
the
replication
of
fragments
is
determined.University
of
Shanghai
for
Science
and
TechnologyDistributed
DBMSPage
2.5关于分片和分配的几点注意Fragmentation
design
been
partially
analyzed
in
centralized
systemswith
multiple
storage
devices.The
allocation
problem
has
been
studied
as
the
"file
allocation
problem."The
distinction
between
two
problems
is
conceptually
relevantone
deals
with
the
"logical
criteria"
which
motivate
thefragmentation
of
a
global
relationone
deals
with
the
"physical"
placement
of
data
at
the
various
sites.这两个问题通常是相互关联的,不可能独立地解决它们而
能确定最优的fragmentaion和allocationUniversity
of
Shanghai
for
Science
and
TechnologyDistributed
DBMSPage
2.6关于APPLICATION考虑因素:分布式数据库设计包括:分布式数据库设计和相应的分布式应用设计1.
The
site
from
which
the
application
is
issued
(site
of
origin
of
theapplication).2.
The
frequency
of
activation
of
the
application
(i.e.,在单位时间内被激活
的次数);
applications
which
can
be
issued
at
multiple
sites,
we
need
to
know
the
frequency
ofactivation
of
each
application
at
each
site.3.
The
number,
type,
and
the
statistical
distribution
of
accesses
made
byeach
application
to
each
required
data
"object."University
of
Shanghai
for
Science
and
TechnologyDistributed
DBMSPage
2.7设计目标
(Objectives)Processing
locality数据处理的本地性Availability
and
reliability
of
distributed
data
分
布式数据的有效性和可靠性冗余控制Workload
distribution
工作负荷的合理分布Storage
costs
and
availability存储能力和费用Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.8Processing
localityMaximizeprocessing
locality
corresponds
to
the
simple
principle
ofplacing
data
as
close
as
possible
to
the
applications
which
usethem.Maximizing
processing
locality
(minimizing
remote
references)
canbe
done
by
adding
the
number
of
local
and
remote
referencescorresponding
to
each
candidate
fragmentation
and
fragment,
allocation,
and
selecting
the
best
solution
among
them.The
advantage
of
complete
locality
is
not
only
the
reduction
of
remoteaccesses,
but
also
the
increased
simplicity
in
controlling
the
executionof
theapplication.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.9Availability
and
reliability
of
distributed
dataA
high
degree
of
availability
for
read-only
applications
isachieved
by
storing
multiple
copies
of
the
same
information;the
system
must
be
able
to
switch
to
an
alternative
copy
when
the
one
that
should
be
accessed
under
normal
conditions
is
not
available.Reliability
is
also
achieved
by
storing
multiple
copies
of
the
same
information
-
possible
to
recover
from
crashes
orfrom
the
physical
destruction
of
one
of
the
copies
by
using
the
other
still
available
copies.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.10Workload
distributionAn
important
feature
of
distributed
computersystems.To
take
advantage
of
the
different
powers
or
utilizations
of
computers
at
each
site,Maximize
the
degree
of
parallelism
of
execution
of
applications.workload
distribution
might
negatively
affectprocessing
locality
-
to
consider
the
trade-offDistributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.11Storage
costs
and
availabilityShould
reflect
the
cost
and
availability
of
storage
at
the
different
sites.
It
is
possible
to
have
specialized
sites
in
the
network
for
data
storage,
orconversely
to
have
sites
which
do
not
support
mass
storage
at
all.通常存储的费用并不是非常重要(Compared
to
CPU,I/O,
Transmission
of
network).Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.12设计方法Top-Down
Approach自顶向下Bottom-Up
Approach自底向上Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.13Top-down
approach已
有DB..如何分割数据及如何分配这些数据到不同站点过程start
by
designing
the
global
schemadesigning
the
fragmentation
of
the
databasethen
by
allocating
the
fragments
to
the
sites,
creating
the
physicalimagesThe
approach
is
completed
by
performing,
at
each
site,
the
"physical
design"
of
the
data
which
are
allocated
to
it.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.14AccessInformationDistributionDesignObjectivesUser
InputView
IntegrationTop-Down
DesignRequirements
AnalysisConceptual
DesignPhysical
DesignView
DesignUser
InputLCS'sLIS'sGCSES's特点能先看到雏形问题:
When
the
distributed
database
is
developed
as
the
aggregation
ofexisting
databases,
it
is
not
easy
to
follow
the
top-down
approach.
The
global
schema
is
often
produced
as
a
compromise
between
existing
data
descriptions.Distributed
DBMSUniversity
of
Shanghai
for
Science
and
Technology
Page
2.16Bottom-up
approachExisting
databases
are
aggregated(还可能是异构heterogeneous
或完全自治autonomous),
无设计问题(信息集成)!Based
on
the
integration
of
existing
schemata
into
a
single,
global
schema.By
integration,
the
merging
of
common
datadefinitions
and
the
resolution
of
conflicts
among
different
representations
given
to
the
same
data.University
of
Shanghai
for
Science
and
TechnologyDistributed
DBMSPage
2.17bottom-up
approachHorizontal
fragments
of
a
same
global
relation
must
have
thesame
relation
schema
-
easily
enforced
in
a
top-downdesign,
while
it
is
difficult
to
"discover"
it.
The
integrationprocess
should
attempt
to
modify
the
definitions
of
localrelations,
so
that
they
can
be
regarded
as
horizontal
fragments
of
a
common,
global
relation.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.18The
selection
of
a
common
database
modelfor
describing
the
global
schema
of
thedatabase.The
translation
of
each
local
schema
into
thecommon
data
model.The
integration
of
the
local
schemata
into
acommon
global
schema.bottom-up
design
requires(异构情况下)Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.19DDB
设计的两个问题FragmentaionHorizontal
FragmentationVertical
fragmentationAllocation通常分片设计和分配设计需要统筹考
虑Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.20Horizontal
FragmentationPrimary
fragmentation
初级分片Derived
horizontal
fragmentation导出
分片Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.21水平分片原则若
R⇔F={F₁
,F₂
,…,Fn},则完整性
对于每一个元组t
∈R,3F₁
∈F
使得t
∈F不相交性
对Vt
∈F,-3Fj
使得
t
∈Fj,
i≠j可重构性操作是并し(可以忽略,因为完整性就蕴含着)R=U{F₁
,F₂
,..,
Fn}Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.22水平分片一例例子EMP(E#,
NAME,
DEPT,
JOB,
SAL,TEL,..)DEPT={1,2}
JOB={‘P',
‘-P’}假定,应用经常查询的内容是属于部门1且是程序员的职员。
(80/20原则)则可能有的水平分片限定
(Qualification)P={
DEPT=1}P={DEPT=1,JOB=‘P’}P={DEPT=1,JOB=‘P’,SAL>500}Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.23"手工"检查!e.g.,
F1=Oloc='Sa’E;F₂
=Oloc=‘Sb'E生成具有满足分段原则的predicate谓词Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.24如何保证分片原则谓词:用来执行分片选择操作的条件1.A
simple
predicate
简单谓词:Attribute
=
value
eg.:
DEPT=12.A
minterm
predicate
(
小项谓词)
y:
给定简单谓词集P={p₁,p₂
…pn},y=△piepP₁*
也既是P₁*AP₂*A..APn*where(p*=p;or
p₁*=NOT
p;)and
y≠false3.A
fragment
is
the
set
of
all
tuples
for
which
a
minterm
predicate
holds.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.25一些定义谓词生成过程找到常用的AP查询的simple
predicate
(A,θ
Value)
诸如:A<10,A>5,Loc
=Sa,Loc
=Sb生
成
"
小
项
"
谓
词消除可能出现的无用谓词Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.26Global
relation
EMP(EMPNUM,
NAME,
SAL,
TAX,
MGRNUM,
DEPTNUM)Assume:
some
important
APs
require
information
about
employeeswho
are
members
of
department;
other
important
APs
which
require
onlythe
data
of
employees
who
areprogrammers;
these
last
APs
can
be
issued
at
any
site,
and
reference
all
programmers
with
the
same
probability.Assume:
that
there
are
only
two
departments,
1
and
2;
thus,DEPT=1→DEPT≠
2,and
vice
versa.Two
simple
predicates
are
DEPT=1
and
JOB="P"(programmer).
The
minterm
predicates
for
these
two
predicates
areDEPT=1
AND
JOB="P"DEPT=1
AND
JOB≠"P"DEPT≠1
AND
JOB="P"DEPT≠1
AND
JOB
≠"P"Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.27ExampleAll
the
above
simple
predicates
are
relevante.g.
SAL>50
is
not
a
relevant
predicate;Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.28讨论LetP={pi,p₂,...,pn}
be
a
set
ofsimple
predicates.为了正确有效进行分片,则P必须是completeand
minimal1.P
of
predicates
is
complete
if
any
two
tuplesbelonging
to
the
same
fragment
are
referencedwith
the
same
probability
by
any
application.2.P
is
minimal
if
all
its
predicates
are
relevant.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.29complete
and
minimalP₁={DEPT
=1}is
not
complete
-the
applicationsreference
tuples
of
programmers
with
a
greaterprobability
within
each
fragment
produced
by
P₁
.P₂={DEPT=1,JOB="P"}is
complete
and
minimal.P₃={DEPT=1,JOB="P",SAL>50}iscomplete
butnot
minimal,
since
SAL>50
is
not
relevant.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.30ExampleFragmentation
MethodBasis
Consider
a
predicate
p,
which
partitions
the
tuples
of
R
into
two
parts
which
are
referenced
differently
by
at
leastone
application.
Let
P
={p₁}MethodConsider
a
new
simple
predicate
p;
which
partitions
at
least
onefragment
of
P
into
two
parts
which
are
referenced
in
a
different
way
by
at
least
one
application.Set
P—Pup;.
Eliminate
nonrelevant
predicates
from
P.Repeat
this
step
until
the
set
of
the
minterm
fragments
of
P
iscomplete.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.31Consider:
SAL>50:
if
programmers
have
average
salary
greater
than
50,
itdetermines
two
sets
of
employees
who
are
referenced
differently
by
theapplications.P1={SAL>50}Consider:
DEPT
=1;
thispredicate
is
relevant
and
is
added
to
the
previous
one,
P2={
SAL>50,DEPT=1}.Consider:
JOB="P".
The
predicate
is
relevant,set
P3={SAL>50,DEPT=1,JOB="P"}.then
SAL>
50
is
not
relevant
in
P3,
thus,
the
final
set
P4={DEPT=1,JOB=
"P"},
which
is
complete
and
minimal.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.32ExampleA
"reasonable"
way1.Concentrating
on
a
few
importantapplications2.Not
distinguishing
fragments
whosefeatures
are
very
similarDistributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.331.Administrative
applications,
issued
only
at
sites
1
and
3;administrative
applications
about
departments
in
the
northernarea
are
issued
at
site
1;
those
about
departments
in
the
southern
area
are
issued
at
site
3.2.Applications
about
work
conducted
at
each
department;
theycan
be
issued
at
any
department,
but
they
reference
tuples
of
the
departments
which
are
closer
to
their
site
of
origin
with
higher
probability
than
the
tuples
of
other
departments.DEPT
(DEPTNUM,NAME,AREA,MGRNUM)
importantapplications:Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.34P1:DEPTNUM<10P2:10<DEPTNUM<20P3:DEPTNUM
>20P4:AREA
="North"P5:AREA
="South"Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.35Set
of
predicates:Y1:DEPTNUM<10Y2:
DEPTNUM<10Y3:10<DEPTNUM<20Y4:10<DEPTNUM<20Y5:
DEPTNUM>10Y6:
DEPTNUM>10AREA="North"AREA="South“AREA="North"
AREA="South"AREA
="North"AREA="South"andandandandandandDistributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.36可能的谓词限定Reduce,
eg.AREA="North"implies
that
DEPNUM>20y1:DEPTNUM≤10y2:(10<DEPTNUM≤20)AND
(AREA
="North")y3:(10<DEPTNUM≤20)AND
(AREA
=
"South")y4:DEPTNUM>20Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.37Derived
Horizontal
Fragmentation导出分片DHF:
从另一个关系的属性性质或水平分片
推导出来采
用DHF
可以使分片之间的join操作更加容易Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.38DHF
分片exampleeg:SC(S#,C#,GRADE)S(S#,SNAME.
AGE,
SEX)分段设计Define
fragment
SC1
asSelect
SC.S#,C#,GRADE
From
SC,SWhere
SC.S#=S.S#
and
SEX=‘M'Define
fragment
SC2
asSelect
SC.S#,C#,GRADE
From
SC,SWhere
SC.S#=S.S#
and
SEX=‘F'Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.39distributed
joinjoin
graphsTotalSimplepartitioned分布式数据库中的join
连接操作Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.40圆圈:数据分片无向边:两个分片之间有相同
属性值的元组存在Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.41连接图定义R
SJoin
graphA
join
graph
is
total
when
it
contains
all
possible
edgesbetween
fragments
of
R
and
S;完全连接图定义Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.42R
STotal
Join
graphA
reduced
join
graph
is
partitioned
if
the
graph
is
composedof
two
or
moresubgraphs
without
edges
between
themDistributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.43部分连接图定义Partitioned
Join
graphR
SA
reduced
join
graph
R
Sis
simple
if
it
ispartitioned
and
eachsubgraph
has
justone
edgeDistributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.44简单连接图定义Simple
Join
graphSUPPLY
(SNUM,PNUM,
DEPTNUM,
QUAN)
SUPPLY
is
always
used
together
with
another
relationSome
applications
require
information
about
supplies
of
given
suppliers-join
SUPPLY
and
SUPPLIER
on
the
SNUM
attribute.
The
other
applications
require
information
about
supplies
at
a
given
department-join
SUPPLY
and
DEPT
on
the
DEPTNUMattribute.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.45General
example
(continued)DEPT
is
horizontally
fragmented
according
to
values
taken
by
the
attribute
DEPTNUMSUPPLIER
is
horizontally
fragmented
according
to
values
taken
by
the
attribute
SNUM.There
are
two
possible
derived
fragmentationsSUPPLYone
through
the
semi-join
with
SUPPLIER
on
SNUMone
through
the
semi-join
with
DEPT
on
DEPTNUMboth
of
them
are
correct.The
selection
between
these
alternatives
should
take
intoaccount
which
one
of
the
two
corresponding
joins
is
more
used
by
applications.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.46Vertical
FragmentationVertical
FragmentationVertical
Clusteringobjective:将某个AP
频繁使用的属性聚
集在一起,当有多个APs
有时候需要权
衡利弊。Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.47Vertical
Fragmentation为一全局关系R
进行分片是不容易的,因为随着R
的
属性数目增加,可能的分片数目也大幅度增加
(the
number
of
possible
clusters
is
even
larger.)两种启发式方法(heuristic
approaches)The
split
approach
in
which
global
relations
are
progressively
splitinto
fragments分裂法The
grouping
approach
in
which
attributes
are
progressively
aggregated
to
constitute
fragments成组法Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.48EMP(EMPNUM,NAME,
SAL,TAX,MGRNUM,DEPTNUM)APP1
、Administrative
applications,
concentrated
at
site
3,
requiring
NAME,
SAL,
and
TAX
of
employees.APP2
、Applications
about
work
conducted
at
each
department,requiring
NAME,
MGRNUM,
and
DEPTNUM
of
employees;
these
applications
are
issued
at
all
sites,
and
reference
tuples
of
employees
in
the
same
group
of
departments
with
80
percent
probability.Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.49General
example
(continued)EMPI(EMPNUM,NAME,TAX,
SAL)EMP2(EMPNUM,
NAME,
MGRNUM,DEPTNUM)Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.50结果the
simplest
ways:1.
Applying
horizontal
fragmentation
to
vertical
fragments2.Applying
vertical
fragmentation
to
horizontal
fragmentsDistributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.51Mixed
Fragmentationnonredundant
allocation(
easier)The
simplest
method
is
a
"best-fit"
(最佳适应
)
approach;
a
measure
is
associated
with
eachpossible
allocation,
and
the
site
with
the
best
measure
is
selected.redundant
allocationReplication
introduces
further
complexity,例如复
制程度,如何检索和更新等Distributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.52THE
ALLOCATION
OF
FRAGMENTS讨论在进行redundant
allocation冗余分配时,通常先求nonredundant
allocation非冗余分配的解,在此基础上再求
redundant
allocation冗余分配的解The
"additional
replication"
method
is
a
typical
heuristicapproach;
with
this
method,
it
is
possible
to
take
intoaccount
that
the
increase
in
the
degree
of
redundancy
isDistributed
DBMS
University
of
Shanghai
for
Science
and
Technology
Page
2.53progressively
less
beneficial.Two
methods
(for
reduntant
allocation):1.
Determine
the
set
of
all
sites
where
the
benefit
of
allocatingone
copy
of
the
fragment
is
higher
than
the
cost,
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 《增强现实环境下基于体验视角的民俗博物馆展示设计研究》
- 《新消费产业动态跟踪及趋势洞察月报(2024年10月)》范文
- 《S市食品安全监管问题研究》
- 《柿子酒酿造工艺及风味特征的研究》
- 《临夏州易地搬迁农户后续发展问题研究》
- 2024年技师转会与聘用合同
- 《生物阴极强化废水中硫酸盐还原及协同去除锌、锰的研究》
- 《我国农业保险法律制度问题研究》
- 2024年数控加工中心项目资金需求报告代可行性研究报告
- 2024年高纯人造硅灰石项目资金需求报告代可行性研究报告
- 《春节的文化与习俗》课件
- 手机棋牌平台网络游戏商业计划书
- 学校体育与社区体育融合发展的研究
- 医疗机构高警示药品风险管理规范(2023版)
- 一年级体质健康数据
- 八年级物理(上)期中考试分析与教学反思
- 国家开放大学《财政与金融(农)》形考任务1-4参考答案
- 2023银行网点年度工作总结
- 工厂反骚扰虐待强迫歧视政策
- 计算机教室(微机室)学生上机使用记录
- FAI首件检验报告
评论
0/150
提交评论