数据库系统英文课件：ch14 Query Optimization

上传人：窝*** IP属地：安徽上传时间：2022-05-09 格式：PPTX 页数：47 大小：681.64KB 积分：30 举报 版权申诉

已阅读5页，还剩42页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

1、Database System Concepts 5th Ed.Silberschatz, Korth and SudarshanSee www.db- for conditions on re-use Silberschatz, Korth and Sudarshan14.2Database System Concepts - 5th Edition, Aug 27, 2005.nIntroduction nTransformation of Relational ExpressionsnCatalog Information for Cost EstimationnStatistical

2、Information for Cost EstimationnCost-based optimizationnDynamic Programming for Choosing Evaluation PlansnMaterialized views Silberschatz, Korth and Sudarshan14.3Database System Concepts - 5th Edition, Aug 27, 2005.nAlternative ways of evaluating a given querylEquivalent expressionslDifferent algori

3、thms for each operation (Chapter 13)nCost difference between a good and a bad way of evaluating a query can be enormousnNeed to estimate the cost of operationslStatistical information about relations. Examples:4number of tuples, 4number of distinct values for an attributes,4Etc.lStatistics estimatio

4、n for intermediate results4 to compute cost of complex expressionsSilberschatz, Korth and Sudarshan14.4Database System Concepts - 5th Edition, Aug 27, 2005.nRelations generated by two equivalent expressions have the same set of attributes and contain the same set of tupleslalthough their tuples/attr

5、ibutes may be ordered differently.Silberschatz, Korth and Sudarshan14.5Database System Concepts - 5th Edition, Aug 27, 2005.nGeneration of query-evaluation plans for an expression involves several steps:1.Generating logically equivalent expressions using equivalence rules 等价规则等价规则 .2.Annotating resu

6、ltant expressions to get alternative query plans3.Choosing the cheapest plan based on estimated costnThe overall process is called cost based optimization.Silberschatz, Korth and Sudarshan14.6Database System Concepts - 5th Edition, Aug 27, 2005.nTwo relational algebra expressions are said to be equi

7、valent if on every legal database instance the two expressions generate the same set of tupleslNote: order of tuples is irrelevantnIn SQL, inputs and outputs are multisets of tupleslTwo expressions in the multiset version of the relational algebra are said to be equivalent if on every legal database

8、 instance the two expressions generate the same multiset of tuplesnAn equivalence rule says that expressions of two forms are equivalentlCan replace expression of first form by second, or vice versaSilberschatz, Korth and Sudarshan14.7Database System Concepts - 5th Edition, Aug 27, 2005.1.Conjunctiv

9、e selection operations can be deconstructed into a sequence of individual selections.2.Selection operations are commutative.3.Only the last in a sequence of projection operations is needed, the others can be omitted.4.Selections can be combined with Cartesian products and theta joins.a.(E1 X E2) = E

10、1 E2 b.1(E1 2 E2) = E1 1 2 E2 )()(1221EE)()(2121EE)()(121EELLnLLSilberschatz, Korth and Sudarshan14.8Database System Concepts - 5th Edition, Aug 27, 2005.5. Theta-join operations (and natural joins) are commutative.E1 E2 = E2 E16. (a) Natural join operations are associative: (E1 E2) E3 = E1 (E2 E3)(

11、b) Theta joins are associative in the following manner: (E1 1 E2) 2 3 E3 = E1 2 3 (E2 2 E3) where 2 involves attributes from only E2 and E3.Silberschatz, Korth and Sudarshan14.9Database System Concepts - 5th Edition, Aug 27, 2005.Silberschatz, Korth and Sudarshan14.10Database System Concepts - 5th E

12、dition, Aug 27, 2005.7. The selection operation distributes over the theta join operation under the following two conditions:(a) When all the attributes in 0 involve only the attributes of one of the expressions (E1) being joined. 0E1 E2) = (0(E1) E2 (b) When 1 involves only the attributes of E1 and

13、 2 involves only the attributes of E2. 1 E1 E2) = (1(E1) ( (E2)Silberschatz, Korth and Sudarshan14.11Database System Concepts - 5th Edition, Aug 27, 2005.8. The projections operation distributes over the theta join operation as follows:(a) if involves only attributes from L1 L2:(b) Consider a join E

14、1 E2. l Let L1 and L2 be sets of attributes from E1 and E2, respectively. lLet L3 be attributes of E1 that are involved in join condition , but are not in L1 L2, andl let L4 be attributes of E2 that are involved in join condition , but are not in L1 L2.)()()(21212121EEEELLLL)()()(212142312121EEEELLL

15、LLLLLSilberschatz, Korth and Sudarshan14.12Database System Concepts - 5th Edition, Aug 27, 2005.9.The set operations union and intersection are commutative E1 E2 = E2 E1 E1 E2 = E2 E1 n(set difference is not commutative).10.Set union and intersection are associative. (E1 E2) E3 = E1 (E2 E3) (E1 E2)

16、E3 = E1 (E2 E3)11.The selection operation distributes over , and . (E1 E2) = (E1) (E2) and similarly for and in place of Also: (E1 E2) = (E1) E2 and similarly for in place of , but not for 12. The projection operation distributes over union L(E1 E2) = (L(E1) (L(E2) Silberschatz, Korth and Sudarshan1

17、4.13Database System Concepts - 5th Edition, Aug 27, 2005.nQuery: Find the names of all customers who have an account at some branch located in Brooklyn.customer_name(branch_city = “Brooklyn”(branch (account depositor)nTransformation using rule 7a. customer_name (branch_city =“Brooklyn” (branch) (acc

18、ount depositor)nPerforming the selection as early as possible reduces the size of the relation to be joined. Silberschatz, Korth and Sudarshan14.14Database System Concepts - 5th Edition, Aug 27, 2005.nQuery: Find the names of all customers with an account at a Brooklyn branch whose account balance i

19、s over $1000.customer_name(branch_city = “Brooklyn” balance 1000 (branch (account depositor)nTransformation using join associatively (Rule 6a):customer_name(branch_city = “Brooklyn” balance 1000 (branch account) depositor) nSecond form provides an opportunity to apply the “perform selections early”

20、rule, resulting in the subexpression branch_city = “Brooklyn” (branch) balance 1000 (account)nThus a sequence of transformations can be usefulSilberschatz, Korth and Sudarshan14.15Database System Concepts - 5th Edition, Aug 27, 2005.Silberschatz, Korth and Sudarshan14.16Database System Concepts - 5t

21、h Edition, Aug 27, 2005.nWhen we compute(branch_city = “Brooklyn” (branch) account )we obtain a relation whose schema is:(branch_name, branch_city, assets, account_number, balance)nPush projections using equivalence rules 8a and 8b; eliminate unneeded attributes from intermediate results to get: cus

22、tomer_name ( account_number ( (branch_city = “Brooklyn” (branch) account ) depositor )nPerforming the projection as early as possible reduces the size of the relation to be joined. customer_name(branch_city = “Brooklyn” (branch) account) depositor) Silberschatz, Korth and Sudarshan14.17Database Syst

23、em Concepts - 5th Edition, Aug 27, 2005.nFor all relations r1, r2, and r3,(r1 r2) r3 = r1 (r2 r3 )nIf r2 r3 is quite large and r1 r2 is small, we choose (r1 r2) r3 so that we compute and store a smaller temporary relation.Silberschatz, Korth and Sudarshan14.18Database System Concepts - 5th Edition,

24、Aug 27, 2005.nConsider the expressioncustomer_name (branch_city = “Brooklyn” (branch) (account depositor)nCould compute account depositor first, and join result with branch_city = “Brooklyn” (branch)but account depositor is likely to be a large relation.nOnly a small fraction of the banks customers

25、are likely to have accounts in branches located in Brooklynl it is better to compute branch_city = “Brooklyn” (branch) accountfirst. Silberschatz, Korth and Sudarshan14.19Database System Concepts - 5th Edition, Aug 27, 2005.nQuery optimizers use equivalence rules to systematically generate expressio

26、ns equivalent to the given expressionnConceptually, generate all equivalent expressions by repeatedly executing the following step until no more expressions can be found: l for each expression found so far, use all applicable equivalence rules4 add newly generated expressions to the set of expressio

27、ns found so farnThe above approach is very expensive in space and timenSpace requirements reduced by sharing common subexpressions:lwhen E1 is generated from E2 by an equivalence rule, usually only the top level of the two are different, subtrees below are the same and can be shared4E.g. when applyi

28、ng join associativitynTime requirements are reduced by not generating all expressionslMore details shortlySilberschatz, Korth and Sudarshan14.20Database System Concepts - 5th Edition, Aug 27, 2005.nCost of each operator computer as described in Chapter 13lNeed statistics of input relations4E.g. numb

29、er of tuples, sizes of tuplesnInputs can be results of sub-expressionslNeed to estimate statistics of expression resultslTo do so, we require additional statistics4E.g. number of distinct values for an attributenMore on cost estimation laterSilberschatz, Korth and Sudarshan14.21Database System Conce

30、pts - 5th Edition, Aug 27, 2005.nnr: number of tuples in a relation r.nbr: number of blocks containing tuples of r.nlr: size of a tuple of r.nfr: blocking factor of r i.e., the number of tuples of r that fit into one block.nV(A, r): number of distinct values that appear in r for attribute A; same as

31、 the size of A(r).nIf tuples of r are stored together physically in a file, then: rfrnrbSilberschatz, Korth and Sudarshan14.22Database System Concepts - 5th Edition, Aug 27, 2005.nHistogram on attribute age of relation personnEqui-width histogramsnEqui-depth histogramsSilberschatz, Korth and Sudarsh

32、an14.23Database System Concepts - 5th Edition, Aug 27, 2005.n A=v(r)4nr / V(A,r) : number of records that will satisfy the selection4Equality condition on a key attribute: size estimate = 1nAV(r) (case of A V(r) is symmetric)lLet c denote the estimated number of tuples satisfying the condition. lIf

33、min(A,r) and max(A,r) are available in catalog4c = 0 if v min(A,r)4c =l If histograms available, can refine above estimatelIn absence of statistical information c is assumed to be nr / 2.),min(),max(),min(.rArArAvnrSilberschatz, Korth and Sudarshan14.24Database System Concepts - 5th Edition, Aug 27,

34、 2005.nThe selectivity of a condition i is the probability that a tuple in the relation r satisfies i . l If si is the number of satisfying tuples in r, the selectivity of i is given by si /nr.nConjunction: 1 2. . . n (r). Assuming indepdence, estimate of tuples in the result is:nDisjunction:1 2 . .

35、 . n (r). Estimated number of tuples:nNegation: (r). Estimated number of tuples:nr size(r)nrnrnsssn . . . 21)1 (.)1 ()1 (121rnrrrnsnsnsnSilberschatz, Korth and Sudarshan14.25Database System Concepts - 5th Edition, Aug 27, 2005.Running example: depositor customerCatalog information for join examples:

36、nncustomer = 10,000.nfcustomer = 25, which implies that bcustomer =10000/25 = 400.nndepositor = 5000.nfdepositor = 50, which implies that bdepositor = 5000/50 = 100.nV(customer_name, depositor) = 2500, which implies that , on average, each customer has two accounts.lAlso assume that customer_name in

37、 depositor is a foreign key on customer.lV(customer_name, customer) = 10000 (primary key!)Silberschatz, Korth and Sudarshan14.26Database System Concepts - 5th Edition, Aug 27, 2005.nThe Cartesian product r x s contains nr .ns tuples; each tuple occupies sr + ss bytes.nIf R S = , then |r s| = |r x s|

38、. nIf R S is a key for R, then a tuple of s will join with at most one tuple from rlthen |r s|= s.nIf R S in S is a foreign key in S referencing R, then |r s|= |s|.4The case for R S being a foreign key referencing S is symmetric.nIn the example query depositor customer, customer_name in depositor is

39、 a foreign key of customerl hence, the result has exactly ndepositor tuples, which is 5000Silberschatz, Korth and Sudarshan14.27Database System Concepts - 5th Edition, Aug 27, 2005.nIf R S = A is not a key for R or S.If we assume that every tuple t in R produces tuples in R S, the number of tuples i

40、n R S is estimated to be:If the reverse is true, the estimate obtained will be:The lower of these two estimates is probably the more accurate one.nCan improve on above if histograms are availablelUse formula similar to above, for each cell of histograms on the two relations ),(sAVnnsr),(rAVnnsrSilbe

41、rschatz, Korth and Sudarshan14.28Database System Concepts - 5th Edition, Aug 27, 2005.nCompute the size estimates for depositor customer without using information about foreign keys:lV(customer_name, depositor) = 2500, andV(customer_name, customer) = 10000lThe two estimates are 5000 * 10000/2500 =20

42、,000 and 5000 * 10000/10000 = 5000lWe choose the lower estimate, which in this case, is the same as our earlier computation using foreign keys.Silberschatz, Korth and Sudarshan14.29Database System Concepts - 5th Edition, Aug 27, 2005.nProjection: estimated size of A(r) = V(A,r)nAggregation : estimat

43、ed size of Ag gF(r) = V(A,r)nSet operationsl For unions/intersections of selections on the same relation: rewrite and use size estimate for selections4E.g. 1 (r) 2 (r) can be rewritten as 1 2 (r)lFor operations on different relations:4| r s| = | r| + | s|. 4| r s| = min(|r|, |s|).4| r s| = r.4All th

44、e three estimates may be quite inaccurate, but provide upper bounds on the sizes.Silberschatz, Korth and Sudarshan14.30Database System Concepts - 5th Edition, Aug 27, 2005.nOuter join: lEstimated size of r s = size of r s + size of r4Case of right outer join is symmetriclEstimated size of r s = size

45、 of r s + size of r + size of sSilberschatz, Korth and Sudarshan14.31Database System Concepts - 5th Edition, Aug 27, 2005.Selections: (r) nIf forces A to take a specified value: V(A, (r) = 1.4e.g., A = 3nIf forces A to take on one of a specified set of values: V(A, (r) = number of specified values.4

46、(e.g., (A = 1 V A = 3 V A = 4 ), nIf the selection condition is of the form A op restimated V(A, (r) = V(A.r) * s4where s is the selectivity of the selection 选择的中选率.nIn all the other cases: use approximate estimate of min(V(A,r), n (r) )lMore accurate estimate can be got using probability theory, bu

47、t this one works fine generallySilberschatz, Korth and Sudarshan14.32Database System Concepts - 5th Edition, Aug 27, 2005.Joins: r snIf all attributes in A are from r estimated V(A, r s) = min (V(A,r), n r s)nIf A contains attributes A1 from r and A2 from s, then estimated V(A,r s) = min(V(A1,r)*V(A

48、2 A1,s), V(A1 A2,r)*V(A2,s), nr s)l More accurate estimate can be got using probability theory, but this one works fine generally Silberschatz, Korth and Sudarshan14.33Database System Concepts - 5th Edition, Aug 27, 2005.nEstimation of distinct values are straightforward for projections.lThey are th

49、e same in A (r) as in r. nThe same holds for grouping attributes of aggregation.nFor aggregated values lFor min(A) and max(A), the number of distinct values can be estimated as min(V(A,r), V(G,r) where G denotes grouping attributeslFor other aggregates, assume all values are distinct, and use V(G,r)

50、Silberschatz, Korth and Sudarshan14.34Database System Concepts - 5th Edition, Aug 27, 2005.nAn evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated.Silberschatz, Korth and Sudarshan14.35Database System Concepts - 5th Editio

51、n, Aug 27, 2005.nMust consider the interaction of evaluation techniques when choosing evaluation plans: choosing the cheapest algorithm for each operation independently may not yield best overall algorithm. E.g.lmerge-join may be costlier than hash-join, but may provide a sorted output which reduces

52、 the cost for an outer level aggregation.lnested-loop join may provide opportunity for pipeliningnPractical query optimizers incorporate elements of the following two broad approaches:1. Search all the plans and choose the best plan in a cost-based fashion.2. Uses heuristics to choose a plan.Silbers

53、chatz, Korth and Sudarshan14.36Database System Concepts - 5th Edition, Aug 27, 2005.nConsider finding the best join-order for r1 r2 . . . rn.nThere are (2(n 1)!/(n 1)! different join orders for above expression. With n = 7, the number is 665280, with n = 10, the number is greater than 176 billion!nN

54、o need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of r1, r2, . . . rn is computed only once and stored for future use. Silberschatz, Korth and Sudarshan14.37Database System Concepts - 5th Edition, Aug 27, 2005.nTo find best join tree for a se

55、t of n relations:lTo find best plan for a set S of n relations, consider all possible plans of the form: S1 (S S1) where S1 is any non-empty subset of S.lRecursively compute costs for joining subsets of S to find the cost of each plan. Choose the cheapest of the 2n 1 alternatives.lWhen plan for any

56、subset is computed, store it and reuse it when it is required again, instead of recomputing it4Dynamic programmingSilberschatz, Korth and Sudarshan14.38Database System Concepts - 5th Edition, Aug 27, 2005.procedure findbestplan(S) 入口参数 S=r1, r2, , rnif (bestplanS.cost )return bestplanS/ else bestpla

57、nS has not been computed earlier, compute it nowif (S contains only 1 relation) set bestplanS.plan and bestplanS.cost based on the best way of accessing S else for each non-empty subset S1 of S such that S1 SP1= findbestplan(S1)P2= findbestplan(S - S1)A = best algorithm for joining results of P1 and

58、 P2cost = P1.cost + P2.cost + cost of Aif cost bestplanS.cost bestplanS.cost = costbestplanS.plan = “execute P1.plan; execute P2.plan; join results of P1 and P2 using A”return bestplanSSilberschatz, Korth and Sudarshan14.39Database System Concepts - 5th Edition, Aug 27, 2005.nWith dynamic programmin

59、g time complexity of optimization is O(3n). lWith n = 10, this number is 59000 instead of 176 billion!nSpace complexity is O(2n) Silberschatz, Korth and Sudarshan14.40Database System Concepts - 5th Edition, Aug 27, 2005.nIn left-deep join trees, the right-hand-side input for each join is a relation,

60、 not the result of an intermediate join.Silberschatz, Korth and Sudarshan14.41Database System Concepts - 5th Edition, Aug 27, 2005.nTo find best left-deep join tree for a set of n relations:lConsider n alternatives with one relation as right-hand side input and the other relations as left-hand side

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

数据库系统英文课件：ch14 Query Optimization

文档简介

温馨提示

最新文档

评论

数据库系统英文课件：ch14 Query Optimization

文档简介

温馨提示

最新文档

评论

相关文档