数据库管理系统:ch13 Query Processing_第1页
数据库管理系统:ch13 Query Processing_第2页
数据库管理系统:ch13 Query Processing_第3页
数据库管理系统:ch13 Query Processing_第4页
数据库管理系统:ch13 Query Processing_第5页
已阅读5页,还剩29页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Database System Concepts, 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions on re-use Silberschatz, Korth and Sudarshan13.2Database System Concepts - 5th Edition, Aug 27, 2005.nOverview nMeasures of Query CostnSelection Operation nSorting nJoin Operation nOther OperationsnEvaluation

2、 of ExpressionsSilberschatz, Korth and Sudarshan13.3Database System Concepts - 5th Edition, Aug 27, 2005.1. Parsing and translation2. Optimization3. EvaluationSilberschatz, Korth and Sudarshan13.4Database System Concepts - 5th Edition, Aug 27, 2005.nParsing and translationltranslate the query into i

3、ts internal form. This is then translated into relational algebra.lParser checks syntax, verifies relationsnEvaluationlThe query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.Silberschatz, Korth and Sudarshan13.5Database System Concepts - 5t

4、h Edition, Aug 27, 2005.nA relational algebra expression may have many equivalent expressionslE.g., balance2500(balance(account) is equivalent to balance(balance2500(account)nEach relational algebra operation can be evaluated using one of several different algorithmslCorrespondingly, a relational-al

5、gebra expression can be evaluated in many ways. nAnnotated expression specifying detailed evaluation strategy is called an evaluation-plan.lE.g., can use an index on balance to find accounts with balance v; do not use indexnA7 (secondary index, comparison). 4For A V(r) use index to find first index

6、entry v and scan index sequentially from there, to find pointers to records.4For AV (r) just scan leaf pages of index finding pointers to records, till first entry v4In either case, retrieve records that are pointed to requires an I/O for each record Linear file scan may be cheaperSilberschatz, Kort

7、h and Sudarshan13.13Database System Concepts - 5th Edition, Aug 27, 2005.nSeveral different algorithms to implement joinslNested-loop joinlBlock nested-loop joinlIndexed nested-loop joinlMerge-joinlHash-joinnChoice based on cost estimatenExamples use the following informationlNumber of records of cu

8、stomer: 10,000 depositor: 5000lNumber of blocks of customer: 400 depositor: 100Silberschatz, Korth and Sudarshan13.14Database System Concepts - 5th Edition, Aug 27, 2005.nTo compute the theta join r sfor each tuple tr in r do beginfor each tuple ts in s do begintest pair (tr,ts) to see if they satis

9、fy the join condition if they do, add tr ts to the result.endendnr is called the outer relation and s the inner relation of the join.nRequires no indices and can be used with any kind of join condition.nExpensive since it examines every pair of tuples in the two relations. Silberschatz, Korth and Su

10、darshan13.15Database System Concepts - 5th Edition, Aug 27, 2005.nIn the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is nr bs + br block transfers, plus nr + br seeksnIf both relations fit entirely in memory, in the best case.l Reduces cost to br

11、 + bs block transfers and 2 seeksnAssuming worst case memory availability cost estimate islwith depositor as outer relation:45000 400 + 100 = 2,000,100 block transfers,45000 + 100 = 5100 seeks lwith customer as the outer relation 410000 100 + 400 = 1,000,400 block transfers and 10,400 seeksnIn the b

12、est case, the cost estimate will be 500 block transfers.nBlock nested-loops algorithm (next slide) is preferable.Silberschatz, Korth and Sudarshan13.16Database System Concepts - 5th Edition, Aug 27, 2005.nVariant of nested-loop join in which every block of inner relation is paired with every block o

13、f outer relation.for each block Br of r do beginfor each block Bs of s do beginfor each tuple tr in Br do beginfor each tuple ts in Bs do beginCheck if (tr,ts) satisfy the join condition if they do, add tr ts to the result.endendendendSilberschatz, Korth and Sudarshan13.17Database System Concepts -

14、5th Edition, Aug 27, 2005.nWorst case estimate: br bs + br block transfers + 2 * br seekslEach block in the inner relation s is read once for each block in the outer relation (instead of once for each tuple in the outer relationnBest case: br + bs block transfers + 2 seeks.nImprovements to nested lo

15、op and block nested loop algorithms:lIn block nested-loop, use M 2 disk blocks as blocking unit for outer relations, where M = memory size in blocks; use remaining two blocks to buffer inner relation and output4 Cost = br / (M-2) bs + br block transfers + 2 br / (M-2) seekslIf equi-join attribute fo

16、rms a key or inner relation, stop inner loop on first matchlScan inner loop forward and backward alternately, to make use of the blocks remaining in buffer (with LRU replacement)lUse index on inner relation if available (next slide)Silberschatz, Korth and Sudarshan13.18Database System Concepts - 5th

17、 Edition, Aug 27, 2005.nIndex lookups can replace file scans ifljoin is an equi-join or natural join andlan index is available on the inner relations join attribute4Can construct an index just to compute a join.nFor each tuple tr in the outer relation r, use the index to look up tuples in s that sat

18、isfy the join condition with tuple tr.nWorst case: buffer has space for only one page of r, one page of index(for s), and, for each tuple in r, we perform an index lookup on s.nCost of the join: br (tT + tS) + nr clWhere c is the cost of traversing index and fetching all matching s tuples for one tu

19、ple or rlc can be estimated as cost of a single selection on s using the join condition.nIf indices are available on join attributes of both r and s,use the relation with fewer tuples as the outer relation.Silberschatz, Korth and Sudarshan13.19Database System Concepts - 5th Edition, Aug 27, 2005.nCo

20、mpute depositor customer, with depositor as the outer relation.nLet customer have a primary B+-tree index on the join attribute customer-name, which contains 20 entries in each index node.nSince customer has 10,000 tuples, the height of the tree is 4, and one more access is needed to find the actual

21、 datandepositor has 5000 tuplesnCost of block nested loops joinl400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks4 assuming worst case memory 4may be significantly less with more memoryn Cost of indexed nested loops joinl100 + 5000 * 5 = 25,100 block transfers and seeks.lCPU cost likely t

22、o be less than that for block nested loops join Silberschatz, Korth and Sudarshan13.20Database System Concepts - 5th Edition, Aug 27, 2005.1.Sort both relations on their join attribute (if not already sorted on the join attributes).2.Merge the sorted relations to join them1.Join step is similar to t

23、he merge stage of the sort-merge algorithm. 2.Main difference is handling of duplicate values in join attribute every pair with same value on join attribute must be matched3.Detailed algorithm in bookSilberschatz, Korth and Sudarshan13.21Database System Concepts - 5th Edition, Aug 27, 2005.nCan be u

24、sed only for equi-joins and natural joinsnEach block needs to be read only once (assuming all tuples for any given value of the join attributes fit in memorynThus the cost of merge join is: br + bs block transfers + br / bb + bs / bb seeks+ the cost of sorting if relations are unsorted.bb is the num

25、ber of buffer for the relations.Silberschatz, Korth and Sudarshan13.22Database System Concepts - 5th Edition, Aug 27, 2005.nApplicable for equi-joins and natural joins.nA hash function h is used to partition tuples of both relations nh maps JoinAttrs values to 0, 1, ., n, where JoinAttrs denotes the

26、 common attributes of r and s used in the natural join. lr0, r1, . . ., rn denote partitions of r tuples4Each tuple tr r is put in partition ri where i = h(tr JoinAttrs).lr0, r1. . ., rn denotes partitions of s tuples4Each tuple ts s is put in partition si, where i = h(ts JoinAttrs).nNote: In book,

27、ri is denoted as Hri, si is denoted as Hsi and n is denoted as nh. Silberschatz, Korth and Sudarshan13.23Database System Concepts - 5th Edition, Aug 27, 2005.Silberschatz, Korth and Sudarshan13.24Database System Concepts - 5th Edition, Aug 27, 2005.nr tuples in ri need only to be compared with s tup

28、les in si Need not be compared with s tuples in any other partition, since:lan r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes.lIf that value is hashed to some value i, the r tuple has to be in ri and the s tuple in si.Silberschatz, Korth and S

29、udarshan13.25Database System Concepts - 5th Edition, Aug 27, 2005.1. Partition the relation s using hashing function h. When partitioning a relation, one block of memory is reserved as the output buffer for each partition.2. Partition r similarly.3. For each i:(a) Load si into memory and build an in

30、-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h.(b) Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in-memory hash index. Output the concatenation of their attribu

31、tes.The hash-join of r and s is computed as follows.Relation s is called the build input and r is called the probe input.Silberschatz, Korth and Sudarshan13.26Database System Concepts - 5th Edition, Aug 27, 2005.nThe value n and the hash function h is chosen such that each si should fit in memory.lT

32、ypically n is chosen as bs/M * f where f is a “fudge factor”, typically around 1.2lThe probe relation partitions si need not fit in memorynRecursive partitioning required if number of partitions n is greater than number of pages M of memory.nIf M2bs, ,no need recursive partitioning Silberschatz, Kor

33、th and Sudarshan13.27Database System Concepts - 5th Edition, Aug 27, 2005.nIf recursive partitioning is not required: cost of hash join is 3(br + bs) +4 nh block transfers + 2( br / bb + bs / bb) seeks often 4 nh can be omitted. nIf the entire build input can be kept in main memory no partitioning(n

34、h=0) is requiredlCost estimate goes down to br + bs.plus 2 seeksSilberschatz, Korth and Sudarshan13.28Database System Concepts - 5th Edition, Aug 27, 2005.nAssume that memory size is 20 blocksnbdepositor= 100 and bcustomer = 400.nassuming 3 buffers for these two relations. Therefore total cost, igno

35、ring cost of writing partially filled blocks:l3(100 + 400) = 1500 block transfers +2( 100/3 + 400/3) = 336 seekscustomer depositorSilberschatz, Korth and Sudarshan13.29Database System Concepts - 5th Edition, Aug 27, 2005.nMaterialized evaluation: evaluate one operation at a time, starting at the low

36、est-level. Use intermediate results materialized into temporary relations to evaluate next-level operations.nE.g., in figure below, compute and storethen compute the store its join with customer, and finally compute the projections on customer-name. )(2500accountbalanceSilberschatz, Korth and Sudars

37、han13.30Database System Concepts - 5th Edition, Aug 27, 2005.nMaterialized evaluation is always applicablenCost of writing results to disk and reading them back can be quite highlOur cost formulas for operations ignore cost of writing results to disk, so4Overall cost = Sum of costs of individual ope

38、rations + cost of writing intermediate results to disknDouble buffering: use two output buffers for each operation, when one is full write it to disk while the other is getting filledlreduces execution timeSilberschatz, Korth and Sudarshan13.31Database System Concepts - 5th Edition, Aug 27, 2005.nPi

39、pelined evaluation : evaluate several operations simultaneously, passing the results of one operation on to the next.nE.g., in previous expression tree, dont store result of linstead, pass tuples directly to the join. Similarly, dont store result of join, pass tuples directly to projection. nMuch cheaper than materialization: no need to store a temporary relation to disk.nPipelining may

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论