




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Chapter 13: Query ProcessingChapter 13: Query ProcessingOverview Measures of Query CostSelection Operation Sorting Join Operation Other OperationsEvaluation of ExpressionsBasic Steps in Query Processing1.Parsing and translation2.Optimization3.EvaluationBasic Steps in Query Processing (Cont.)Parsing
2、and translationtranslate the query into its internal form. This is then translated into relational algebra.Parser checks syntax, verifies relationsEvaluationThe query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.Basic Steps in Query Process
3、ing : OptimizationA relational algebra expression may have many equivalent expressionsE.g., balance2500(balance(account) is equivalent to balance(balance2500(account)Each relational algebra operation can be evaluated using one of several different algorithmsCorrespondingly, a relational-algebra expr
4、ession can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.E.g., can use an index on balance to find accounts with balance v; do not use indexA7 (secondary index, comparison). For A V(r) use index to find first index entry v and sc
5、an index sequentially from there, to find pointers to records.For AV (r) just scan leaf pages of index finding pointers to records, till first entry vIn either case, retrieve records that are pointed to requires an I/O for each record Linear file scan may be cheaperJoin OperationSeveral different al
6、gorithms to implement joinsNested-loop joinBlock nested-loop joinIndexed nested-loop joinMerge-joinHash-joinChoice based on cost estimateExamples use the following informationNumber of records of customer: 10,000 depositor: 5000Number of blocks of customer: 400 depositor: 100Nested-Loop JoinTo compu
7、te the theta join r sfor each tuple tr in r do beginfor each tuple ts in s do begintest pair (tr,ts) to see if they satisfy the join condition if they do, add tr ts to the result.endendr is called the outer relation and s the inner relation of the join.Requires no indices and can be used with any ki
8、nd of join condition.Expensive since it examines every pair of tuples in the two relations. Nested-Loop Join (Cont.)In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is nr bs + br block transfers, plus nr + br seeksIf both relations fit entirely
9、 in memory, in the best case. Reduces cost to br + bs block transfers and 2 seeksAssuming worst case memory availability cost estimate iswith depositor as outer relation:5000 400 + 100 = 2,000,100 block transfers,5000 + 100 = 5100 seeks with customer as the outer relation 10000 100 + 400 = 1,000,400
10、 block transfers and 10,400 seeksIn the best case, the cost estimate will be 500 block transfers.Block nested-loops algorithm (next slide) is preferable.Block Nested-Loop JoinVariant of nested-loop join in which every block of inner relation is paired with every block of outer relation.for each bloc
11、k Br of r do beginfor each block Bs of s do beginfor each tuple tr in Br do beginfor each tuple ts in Bs do beginCheck if (tr,ts) satisfy the join condition if they do, add tr ts to the result.endendendendBlock Nested-Loop Join (Cont.)Worst case estimate: br bs + br block transfers + 2 * br seeksEac
12、h block in the inner relation s is read once for each block in the outer relation (instead of once for each tuple in the outer relationBest case: br + bs block transfers + 2 seeks.Improvements to nested loop and block nested loop algorithms:In block nested-loop, use M 2 disk blocks as blocking unit
13、for outer relations, where M = memory size in blocks; use remaining two blocks to buffer inner relation and output Cost = br / (M-2) bs + br block transfers + 2 br / (M-2) seeksIf equi-join attribute forms a key or inner relation, stop inner loop on first matchScan inner loop forward and backward al
14、ternately, to make use of the blocks remaining in buffer (with LRU replacement)Use index on inner relation if available (next slide)Indexed Nested-Loop JoinIndex lookups can replace file scans ifjoin is an equi-join or natural join andan index is available on the inner relations join attributeCan co
15、nstruct an index just to compute a join.For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr.Worst case: buffer has space for only one page of r, one page of index(for s), and, for each tuple in r, we perform an index lookup on
16、 s.Cost of the join: br (tT + tS) + nr cWhere c is the cost of traversing index and fetching all matching s tuples for one tuple or rc can be estimated as cost of a single selection on s using the join condition.If indices are available on join attributes of both r and s,use the relation with fewer
17、tuples as the outer relation.Example of Indexed Nested-Loop Join CostsCompute depositor customer, with depositor as the outer relation.Let customer have a primary B+-tree index on the join attribute customer-name, which contains 20 entries in each index node.Since customer has 10,000 tuples, the hei
18、ght of the tree is 4, and one more access is needed to find the actual datadepositor has 5000 tuplesCost of block nested loops join400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks assuming worst case memory may be significantly less with more memory Cost of indexed nested loops join100 +
19、 5000 * 5 = 25,100 block transfers and seeks.CPU cost likely to be less than that for block nested loops join Merge-JoinSort both relations on their join attribute (if not already sorted on the join attributes).Merge the sorted relations to join themJoin step is similar to the merge stage of the sor
20、t-merge algorithm. Main difference is handling of duplicate values in join attribute every pair with same value on join attribute must be matchedDetailed algorithm in bookMerge-Join (Cont.)Can be used only for equi-joins and natural joinsEach block needs to be read only once (assuming all tuples for
21、 any given value of the join attributes fit in memoryThus the cost of merge join is: br + bs block transfers + br / bb + bs / bb seeks+ the cost of sorting if relations are unsorted.bb is the number of buffer for the relations.Hash-JoinApplicable for equi-joins and natural joins.A hash function h is
22、 used to partition tuples of both relations h maps JoinAttrs values to 0, 1, ., n, where JoinAttrs denotes the common attributes of r and s used in the natural join. r0, r1, . . ., rn denote partitions of r tuplesEach tuple tr r is put in partition ri where i = h(tr JoinAttrs).r0, r1. . ., rn denote
23、s partitions of s tuplesEach tuple ts s is put in partition si, where i = h(ts JoinAttrs).Note: In book, ri is denoted as Hri, si is denoted as Hsi and n is denoted as nh. Hash-Join (Cont.)Hash-Join (Cont.)r tuples in ri need only to be compared with s tuples in si Need not be compared with s tuples
24、 in any other partition, since:an r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes.If that value is hashed to some value i, the r tuple has to be in ri and the s tuple in si.Hash-Join Algorithm1.Partition the relation s using hashing function h.
25、 When partitioning a relation, one block of memory is reserved as the output buffer for each partition.2.Partition r similarly.3.For each i:(a)Load si into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h.(
26、b)Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in-memory hash index. Output the concatenation of their attributes.The hash-join of r and s is computed as follows.Relation s is called the build input and r is called the probe input.Ha
27、sh-Join algorithm (Cont.)The value n and the hash function h is chosen such that each si should fit in memory.Typically n is chosen as bs/M * f where f is a “fudge factor”, typically around 1.2The probe relation partitions si need not fit in memoryRecursive partitioning required if number of partiti
28、ons n is greater than number of pages M of memory.If M2bs, ,no need recursive partitioning Cost of Hash-JoinIf recursive partitioning is not required: cost of hash join is 3(br + bs) +4 nh block transfers + 2( br / bb + bs / bb) seeks often 4 nh can be omitted. If the entire build input can be kept
29、in main memory no partitioning(nh=0) is requiredCost estimate goes down to br + bs.plus 2 seeksExample of Cost of Hash-JoinAssume that memory size is 20 blocksbdepositor= 100 and bcustomer = 400.assuming 3 buffers for these two relations. Therefore total cost, ignoring cost of writing partially fill
30、ed blocks:3(100 + 400) = 1500 block transfers +2( 100/3 + 400/3) = 336 seekscustomer depositorMaterializationMaterialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level operations.E.g.,
31、 in figure below, compute and storethen compute the store its join with customer, and finally compute the projections on customer-name. Materialization (Cont.)Materialized evaluation is always applicableCost of writing results to disk and reading them back can be quite highOur cost formulas for oper
32、ations ignore cost of writing results to disk, soOverall cost = Sum of costs of individual operations + cost of writing intermediate results to diskDouble buffering: use two output buffers for each operation, when one is full write it to disk while the other is getting filledreduces execution timePi
33、peliningPipelined evaluation : evaluate several operations simultaneously, passing the results of one operation on to the next.E.g., in previous expression tree, dont store result of instead, pass tuples directly to the join. Similarly, dont store result of join, pass tuples directly to projection. Much cheaper than materializa
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 小企业劳动用工合同
- 夏令营代理商合作协议新
- 买卖合作协议合同
- 产品销售数据类表格
- 美甲店装修施工方案模板
- TCSG 13-2024 高纯工业品氟化锂
- 《大数据技术导论》-课程标准
- 布帘施工方案
- 水利水电施工方案
- 预制桩钢平台基础施工方案
- 数学家华罗庚课件
- 彩票风险评估与控制
- 片上互连优化与总线接口设计
- 2024年中国包子行业发展前景及投资前景预测报告(智研咨询)
- 西方经济学考试题库(含参考答案)
- 2024年全国职业院校技能大赛高职组(婴幼儿健康养育照护赛项)考试题库(含答案)
- 学校食堂餐饮服务投标方案(技术方案)
- 国企集团公司各岗位廉洁风险点防控表格(廉政)范本
- 中医师承跟师笔记50篇
- 四年级语文《乡下人家》作业设计
- 儿童健康产业行业研究报告
评论
0/150
提交评论