Cost-basedQueryScramblingforInitialDelays_第1页
Cost-basedQueryScramblingforInitialDelays_第2页
Cost-basedQueryScramblingforInitialDelays_第3页
Cost-basedQueryScramblingforInitialDelays_第4页
Cost-basedQueryScramblingforInitialDelays_第5页
已阅读5页,还剩23页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Cost-based Query Scrambling for Initial DelaysTolga UrhanMichael J. FranklinLaurent Amsaleg1Introduction:Problem: Remote data access (e.g. from the Web) in query processing introduce unpredictable delays.Traditional query scrambling based on simple heuristics is susceptible to problems from bad scra

2、mbling decisions.Three different approaches to using query optimization for scrambling are proposed.More intelligent scrambling decisions are made.2Query Scrambling Overview:Goal: To hide the delays encountered when obtaining data from remote sources by performing other useful work.Consists of two p

3、hases:Rescheduling phaseOperator Synthesis phaseObjective function of optimization based either on total workor response time 3Goals of this paper:Focus on the problem of initial delay:delay in receiving the first tuple from a particular remote source.Due to difficulty in establishing a connection t

4、o a remote source,heavy load of the remote sourcelarge amount of work remote source needs to perform (lack of global optimization)Investigate both the use of response time-based and total work-based optimization for query scrambling4Assumption:Query execution environment consists of a query site and

5、 remote data sources.Processing work occurs in both query site and remote sitesexample of a complex query tree:DEBACQuery ResultCommunicationLinkSelectJoinCost-based Query Scrambling:5Iterative Process of Query Scrambling(1) Rescheduling: execution plan of a query is dynamically rescheduled when del

6、ay is detected. (2) Operator Synthesis: new operators can be created when there are no other operators that can execute. E.g.: Stalls in getting A tuples:(1) retrieve B tuples, execute D join E(2) execute a new join between B and (D join E) join C6Cost-based Rescheduling:Identify runnable subtrees:

7、subtrees made up entirely of nonbocked operators.Runnable subtrees can be scheduled out of order by inserting materialization operators at its root.Materialization operators: they issue Open, Next, close calls to the root of the subtree and save results in a temporary relation.7Selection of runnable

8、 subtrees to execute:Traditional way: “maximal” one.Maximal efficiency: (P - MR)/(P + MW)MW: cost of writing materialized temporary result to diskMR: cost of reading temporary results form diskP: cost of executing the subtreeefficiency: improvement in response time per unit of scrambling executionAn

9、other iteration of rescheduling is started until the delayed data has arrived.Cost-based Rescheduling:8Second phase starts when no more progress can be made in phase 1.Three approaches of optimization strategies:Pair (IN) Include Delayed(ED) Estimated DelayCost-based Operator Synthesis:9Construct a

10、query plan containing only a single join using two unblocked relations.Analyzes each pair of unblocked relations sharing a join predicate.Chooses the join with the least total cost to execute.Materialize the results of the join to disk.Avoids Cartesian products, joins whose produced results take lon

11、ger to read from disk than to compute from scratch.Pair: total work-based optimizer10At the end of each join, checks for the arrival of delayed data. If not arrived, do another iteration If no qualified joins exist, wait for delayed data to arriveReconstruction phase:when all blocked relations becom

12、e available, need to construct a single query treenecessary, since Pair policy works only on pairs of relations and does not maintain a complete query planPair continued:11Each iteration generates a complete alternative planChooses a very long delay duration (relative to response time) to postpone a

13、ny access to the delayed data.Chooses a plan with the greatest benefit (potential improvement in response time) whose risk (duration of the optimization step) can be overlapped with the expected delay duration.IN (Include Delayed): response time-based optimizer12Use risk/benefit knob (Rbknob) to pre

14、vent optimizer from choosing high-risk plans for relatively small potential gains over low risk plans.Rbknob: ratio of the amount of benefit the optimizer is willing to give up for a given savings in risk.Increasing Rbknob - more conservative plans.IN Contined:13Similar to IN except that it uses rel

15、atively short delay estimates (relative to the response time of the non-delayed plan) Delay estimates successively increase when necessary to make more progressMotivation: Use low risk plans when delays are short, use high risk/high pay off plans for larger delays.ED (Estimated Delay): response time

16、-based optimizer14Execution steps:Starts by picking an estimated delay value equal to 25% of the original query response timeRepeat iterations until progress is too smallIncrease delay value to 50% of response timeIncrease to 100% of response time if progress is still too small.For short delays: scr

17、ambling more usefulFor long delays: more aggressive.ED Continued:15Two-phase randomized query optimizerWorkload based on queries from TPC-D benchmarksSingle query site, six remote data source sites.Experimental methodology: plots the duration of initial delay of a remote source vs. the response time

18、 achieved using scrambling Experimental Setup:16One case study:171. With sufficient memory, all cost-based approaches (Pair, IN, ED) can effectively hide initial delays.When delayed relations are encountered early in the query execution, delay as long as normal response time can be absorbed.Heuristi

19、c-based algorithm performs worse than original query w/o scrambling, unless there is substantial amount of extra memory for scramblingLessons learned:182. Cost-based scrambling: tradeoff between conservative approaches and aggressive ones. conservative: safer for short delaysaggressive: bigger savin

20、gs for long delaysAmount of delay to be hidden is limited by the normal response time of the queryAs delay increases beyond normal response time, benefits of scrambling as a percentage of total execution time decreases.So, maybe more conservative plans?Lessons learned:193. As memory available for sc

21、rambling is reduced, scrambling plans are more expensive. Longer delay duration is needed for scrambling to pay offGood predictions of delay duration needed to make good scrambling decisionsLessons Learned:204. Aggressiveness of IN and ED policies can be adjusted using Rbknob.Give up potential gains

22、 for long delays in order to reduce risks for short delaysThis tradeoff is useful in the absence of accurate predictions of delay duration.Lessons Learned:215. Pair (total work-based optimizer) may perform unnecessary worklack of a global view of the scrambled planunable to pick slightly sub-optimal

23、 plan to generate interesting ordersTherefore, response time should be used as the objective function to generated a complete and reasonable scrambled plan.Lessons Learned:22Discussion:How to tune Rbknob?Query scrambling can be very dangerous, when delay duration is short. How to reduce the risks?Cr

24、oss products might be Okay sometimes?Reality of scenarios studied?QS treats remote sites as black boxes. Additional processing at data source sites?Nonblocking join algorithm instead of hash join?23Discussion continued:Better delay duration estimates? (using probes)Quality of decisions limited by th

25、at of optimizer.Replicas complicate problems?Query scrambling decision should take selectivity, size of intermediate results, importance of operators into consideration.Only addressed the problem of initial delay, ignores bandwidth problem.24Discussion continued:Checking for arrival of delayed data during a scrambling step?25Do not adapt to changes in the system parameters during query execution:Volcano optimizer: introduces choose-plan operators into a query plan to compensate for the lack of information about system parameters at co

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论