版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Chapter 13: Query ProcessingChapter 13: Query ProcessingOverview Measures of Query CostSelection Operation Sorting Join Operation Other OperationsEvaluation of ExpressionsBasic Steps in Query Processing1.Parsing and translation2.Optimization3.EvaluationBasic Steps in Query Processing (Cont.)Parsing
2、and translationtranslate the query into its internal form. This is then translated into relational algebra.Parser checks syntax, verifies relationsEvaluationThe query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers to the query.Basic Steps in Query Process
3、ing : OptimizationA relational algebra expression may have many equivalent expressionsE.g., balance2500(balance(account) is equivalent to balance(balance2500(account)Each relational algebra operation can be evaluated using one of several different algorithmsCorrespondingly, a relational-algebra expr
4、ession can be evaluated in many ways. Annotated expression specifying detailed evaluation strategy is called an evaluation-plan.E.g., can use an index on balance to find accounts with balance v; do not use indexA7 (secondary index, comparison). For A V(r) use index to find first index entry v and sc
5、an index sequentially from there, to find pointers to records.For AV (r) just scan leaf pages of index finding pointers to records, till first entry vIn either case, retrieve records that are pointed to requires an I/O for each record Linear file scan may be cheaperJoin OperationSeveral different al
6、gorithms to implement joinsNested-loop joinBlock nested-loop joinIndexed nested-loop joinMerge-joinHash-joinChoice based on cost estimateExamples use the following informationNumber of records of customer: 10,000 depositor: 5000Number of blocks of customer: 400 depositor: 100Nested-Loop JoinTo compu
7、te the theta join r sfor each tuple tr in r do beginfor each tuple ts in s do begintest pair (tr,ts) to see if they satisfy the join condition if they do, add tr ts to the result.endendr is called the outer relation and s the inner relation of the join.Requires no indices and can be used with any ki
8、nd of join condition.Expensive since it examines every pair of tuples in the two relations. Nested-Loop Join (Cont.)In the worst case, if there is enough memory only to hold one block of each relation, the estimated cost is nr bs + br block transfers, plus nr + br seeksIf both relations fit entirely
9、 in memory, in the best case. Reduces cost to br + bs block transfers and 2 seeksAssuming worst case memory availability cost estimate iswith depositor as outer relation:5000 400 + 100 = 2,000,100 block transfers,5000 + 100 = 5100 seeks with customer as the outer relation 10000 100 + 400 = 1,000,400
10、 block transfers and 10,400 seeksIn the best case, the cost estimate will be 500 block transfers.Block nested-loops algorithm (next slide) is preferable.Block Nested-Loop JoinVariant of nested-loop join in which every block of inner relation is paired with every block of outer relation.for each bloc
11、k Br of r do beginfor each block Bs of s do beginfor each tuple tr in Br do beginfor each tuple ts in Bs do beginCheck if (tr,ts) satisfy the join condition if they do, add tr ts to the result.endendendendBlock Nested-Loop Join (Cont.)Worst case estimate: br bs + br block transfers + 2 * br seeksEac
12、h block in the inner relation s is read once for each block in the outer relation (instead of once for each tuple in the outer relationBest case: br + bs block transfers + 2 seeks.Improvements to nested loop and block nested loop algorithms:In block nested-loop, use M 2 disk blocks as blocking unit
13、for outer relations, where M = memory size in blocks; use remaining two blocks to buffer inner relation and output Cost = br / (M-2) bs + br block transfers + 2 br / (M-2) seeksIf equi-join attribute forms a key or inner relation, stop inner loop on first matchScan inner loop forward and backward al
14、ternately, to make use of the blocks remaining in buffer (with LRU replacement)Use index on inner relation if available (next slide)Indexed Nested-Loop JoinIndex lookups can replace file scans ifjoin is an equi-join or natural join andan index is available on the inner relations join attributeCan co
15、nstruct an index just to compute a join.For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr.Worst case: buffer has space for only one page of r, one page of index(for s), and, for each tuple in r, we perform an index lookup on
16、 s.Cost of the join: br (tT + tS) + nr cWhere c is the cost of traversing index and fetching all matching s tuples for one tuple or rc can be estimated as cost of a single selection on s using the join condition.If indices are available on join attributes of both r and s,use the relation with fewer
17、tuples as the outer relation.Example of Indexed Nested-Loop Join CostsCompute depositor customer, with depositor as the outer relation.Let customer have a primary B+-tree index on the join attribute customer-name, which contains 20 entries in each index node.Since customer has 10,000 tuples, the hei
18、ght of the tree is 4, and one more access is needed to find the actual datadepositor has 5000 tuplesCost of block nested loops join400*100 + 100 = 40,100 block transfers + 2 * 100 = 200 seeks assuming worst case memory may be significantly less with more memory Cost of indexed nested loops join100 +
19、 5000 * 5 = 25,100 block transfers and seeks.CPU cost likely to be less than that for block nested loops join Merge-JoinSort both relations on their join attribute (if not already sorted on the join attributes).Merge the sorted relations to join themJoin step is similar to the merge stage of the sor
20、t-merge algorithm. Main difference is handling of duplicate values in join attribute every pair with same value on join attribute must be matchedDetailed algorithm in bookMerge-Join (Cont.)Can be used only for equi-joins and natural joinsEach block needs to be read only once (assuming all tuples for
21、 any given value of the join attributes fit in memoryThus the cost of merge join is: br + bs block transfers + br / bb + bs / bb seeks+ the cost of sorting if relations are unsorted.bb is the number of buffer for the relations.Hash-JoinApplicable for equi-joins and natural joins.A hash function h is
22、 used to partition tuples of both relations h maps JoinAttrs values to 0, 1, ., n, where JoinAttrs denotes the common attributes of r and s used in the natural join. r0, r1, . . ., rn denote partitions of r tuplesEach tuple tr r is put in partition ri where i = h(tr JoinAttrs).r0, r1. . ., rn denote
23、s partitions of s tuplesEach tuple ts s is put in partition si, where i = h(ts JoinAttrs).Note: In book, ri is denoted as Hri, si is denoted as Hsi and n is denoted as nh. Hash-Join (Cont.)Hash-Join (Cont.)r tuples in ri need only to be compared with s tuples in si Need not be compared with s tuples
24、 in any other partition, since:an r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes.If that value is hashed to some value i, the r tuple has to be in ri and the s tuple in si.Hash-Join Algorithm1.Partition the relation s using hashing function h.
25、 When partitioning a relation, one block of memory is reserved as the output buffer for each partition.2.Partition r similarly.3.For each i:(a)Load si into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h.(
26、b)Read the tuples in ri from the disk one by one. For each tuple tr locate each matching tuple ts in si using the in-memory hash index. Output the concatenation of their attributes.The hash-join of r and s is computed as follows.Relation s is called the build input and r is called the probe input.Ha
27、sh-Join algorithm (Cont.)The value n and the hash function h is chosen such that each si should fit in memory.Typically n is chosen as bs/M * f where f is a “fudge factor”, typically around 1.2The probe relation partitions si need not fit in memoryRecursive partitioning required if number of partiti
28、ons n is greater than number of pages M of memory.If M2bs, ,no need recursive partitioning Cost of Hash-JoinIf recursive partitioning is not required: cost of hash join is 3(br + bs) +4 nh block transfers + 2( br / bb + bs / bb) seeks often 4 nh can be omitted. If the entire build input can be kept
29、in main memory no partitioning(nh=0) is requiredCost estimate goes down to br + bs.plus 2 seeksExample of Cost of Hash-JoinAssume that memory size is 20 blocksbdepositor= 100 and bcustomer = 400.assuming 3 buffers for these two relations. Therefore total cost, ignoring cost of writing partially fill
30、ed blocks:3(100 + 400) = 1500 block transfers +2( 100/3 + 400/3) = 336 seekscustomer depositorMaterializationMaterialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level operations.E.g.,
31、 in figure below, compute and storethen compute the store its join with customer, and finally compute the projections on customer-name. Materialization (Cont.)Materialized evaluation is always applicableCost of writing results to disk and reading them back can be quite highOur cost formulas for oper
32、ations ignore cost of writing results to disk, soOverall cost = Sum of costs of individual operations + cost of writing intermediate results to diskDouble buffering: use two output buffers for each operation, when one is full write it to disk while the other is getting filledreduces execution timePi
33、peliningPipelined evaluation : evaluate several operations simultaneously, passing the results of one operation on to the next.E.g., in previous expression tree, dont store result of instead, pass tuples directly to the join. Similarly, dont store result of join, pass tuples directly to projection. Much cheaper than materializa
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 电机与电气控制技术 课件 任务7.1.1交流异步电机的调速控制
- 某著名企业高层管理人员薪酬调查报告0729
- 人血白蛋白临床使用规范总结2026
- 《GBT 9734-2008化学试剂 铝测定通 用方法》专题研究报告
- 《GBT 5009.49-2008发酵酒及其配制酒卫生标准的分析方法》专题研究报告
- 《GBT 22402-2008摄影 加工用化学品 无水硫代硫酸钠和五水合硫代硫酸钠》专题研究报告长文
- 《FZT 52048-2017有机阻燃粘胶短纤维》专题研究报告
- 道路安全教育培训班课件
- 道路交通类法律培训课件
- 2026年高校时政热点试题含解析及答案
- 眼镜验光师试题(及答案)
- 选人用人方面存在的问题及改进措施
- 项目管理流程标准作业程序手册
- 自我介绍礼仪课件
- 卫生院孕优知识培训课件
- 2025-2030工业窑炉烟气多污染物协同控制技术
- 培训机构台账
- 电商预算表格财务模板全年计划表格-做账实操
- 泵车日常管理办法
- 骨科术后疼痛评估与护理查房
- 输液泵的使用培训课件
评论
0/150
提交评论