版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Cost-based Query Scrambling for Initial DelaysTolga UrhanMichael J. FranklinLaurent Amsaleg1Introduction:Problem: Remote data access (e.g. from the Web) in query processing introduce unpredictable delays.Traditional query scrambling based on simple heuristics is susceptible to problems from bad scra
2、mbling decisions.Three different approaches to using query optimization for scrambling are proposed.More intelligent scrambling decisions are made.2Query Scrambling Overview:Goal: To hide the delays encountered when obtaining data from remote sources by performing other useful work.Consists of two p
3、hases:Rescheduling phaseOperator Synthesis phaseObjective function of optimization based either on total workor response time 3Goals of this paper:Focus on the problem of initial delay:delay in receiving the first tuple from a particular remote source.Due to difficulty in establishing a connection t
4、o a remote source,heavy load of the remote sourcelarge amount of work remote source needs to perform (lack of global optimization)Investigate both the use of response time-based and total work-based optimization for query scrambling4Assumption:Query execution environment consists of a query site and
5、 remote data sources.Processing work occurs in both query site and remote sitesexample of a complex query tree:DEBACQuery ResultCommunicationLinkSelectJoinCost-based Query Scrambling:5Iterative Process of Query Scrambling(1) Rescheduling: execution plan of a query is dynamically rescheduled when del
6、ay is detected. (2) Operator Synthesis: new operators can be created when there are no other operators that can execute. E.g.: Stalls in getting A tuples:(1) retrieve B tuples, execute D join E(2) execute a new join between B and (D join E) join C6Cost-based Rescheduling:Identify runnable subtrees:
7、subtrees made up entirely of nonbocked operators.Runnable subtrees can be scheduled out of order by inserting materialization operators at its root.Materialization operators: they issue Open, Next, close calls to the root of the subtree and save results in a temporary relation.7Selection of runnable
8、 subtrees to execute:Traditional way: “maximal” one.Maximal efficiency: (P - MR)/(P + MW)MW: cost of writing materialized temporary result to diskMR: cost of reading temporary results form diskP: cost of executing the subtreeefficiency: improvement in response time per unit of scrambling executionAn
9、other iteration of rescheduling is started until the delayed data has arrived.Cost-based Rescheduling:8Second phase starts when no more progress can be made in phase 1.Three approaches of optimization strategies:Pair (IN) Include Delayed(ED) Estimated DelayCost-based Operator Synthesis:9Construct a
10、query plan containing only a single join using two unblocked relations.Analyzes each pair of unblocked relations sharing a join predicate.Chooses the join with the least total cost to execute.Materialize the results of the join to disk.Avoids Cartesian products, joins whose produced results take lon
11、ger to read from disk than to compute from scratch.Pair: total work-based optimizer10At the end of each join, checks for the arrival of delayed data. If not arrived, do another iteration If no qualified joins exist, wait for delayed data to arriveReconstruction phase:when all blocked relations becom
12、e available, need to construct a single query treenecessary, since Pair policy works only on pairs of relations and does not maintain a complete query planPair continued:11Each iteration generates a complete alternative planChooses a very long delay duration (relative to response time) to postpone a
13、ny access to the delayed data.Chooses a plan with the greatest benefit (potential improvement in response time) whose risk (duration of the optimization step) can be overlapped with the expected delay duration.IN (Include Delayed): response time-based optimizer12Use risk/benefit knob (Rbknob) to pre
14、vent optimizer from choosing high-risk plans for relatively small potential gains over low risk plans.Rbknob: ratio of the amount of benefit the optimizer is willing to give up for a given savings in risk.Increasing Rbknob - more conservative plans.IN Contined:13Similar to IN except that it uses rel
15、atively short delay estimates (relative to the response time of the non-delayed plan) Delay estimates successively increase when necessary to make more progressMotivation: Use low risk plans when delays are short, use high risk/high pay off plans for larger delays.ED (Estimated Delay): response time
16、-based optimizer14Execution steps:Starts by picking an estimated delay value equal to 25% of the original query response timeRepeat iterations until progress is too smallIncrease delay value to 50% of response timeIncrease to 100% of response time if progress is still too small.For short delays: scr
17、ambling more usefulFor long delays: more aggressive.ED Continued:15Two-phase randomized query optimizerWorkload based on queries from TPC-D benchmarksSingle query site, six remote data source sites.Experimental methodology: plots the duration of initial delay of a remote source vs. the response time
18、 achieved using scrambling Experimental Setup:16One case study:171. With sufficient memory, all cost-based approaches (Pair, IN, ED) can effectively hide initial delays.When delayed relations are encountered early in the query execution, delay as long as normal response time can be absorbed.Heuristi
19、c-based algorithm performs worse than original query w/o scrambling, unless there is substantial amount of extra memory for scramblingLessons learned:182. Cost-based scrambling: tradeoff between conservative approaches and aggressive ones. conservative: safer for short delaysaggressive: bigger savin
20、gs for long delaysAmount of delay to be hidden is limited by the normal response time of the queryAs delay increases beyond normal response time, benefits of scrambling as a percentage of total execution time decreases.So, maybe more conservative plans?Lessons learned:193. As memory available for sc
21、rambling is reduced, scrambling plans are more expensive. Longer delay duration is needed for scrambling to pay offGood predictions of delay duration needed to make good scrambling decisionsLessons Learned:204. Aggressiveness of IN and ED policies can be adjusted using Rbknob.Give up potential gains
22、 for long delays in order to reduce risks for short delaysThis tradeoff is useful in the absence of accurate predictions of delay duration.Lessons Learned:215. Pair (total work-based optimizer) may perform unnecessary worklack of a global view of the scrambled planunable to pick slightly sub-optimal
23、 plan to generate interesting ordersTherefore, response time should be used as the objective function to generated a complete and reasonable scrambled plan.Lessons Learned:22Discussion:How to tune Rbknob?Query scrambling can be very dangerous, when delay duration is short. How to reduce the risks?Cr
24、oss products might be Okay sometimes?Reality of scenarios studied?QS treats remote sites as black boxes. Additional processing at data source sites?Nonblocking join algorithm instead of hash join?23Discussion continued:Better delay duration estimates? (using probes)Quality of decisions limited by th
25、at of optimizer.Replicas complicate problems?Query scrambling decision should take selectivity, size of intermediate results, importance of operators into consideration.Only addressed the problem of initial delay, ignores bandwidth problem.24Discussion continued:Checking for arrival of delayed data during a scrambling step?25Do not adapt to changes in the system parameters during query execution:Volcano optimizer: introduces choose-plan operators into a query plan to compensate for the lack of information about system parameters at co
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024幼儿园特色课程开发与教师聘用合同2篇
- 2025年度城市道路桥梁养护与维修合同范本3篇
- 2024年餐馆承包经营协议6篇
- 2024年车联网技术研究与应用合同
- 2025年度化学品船运输安全责任协议书模板3篇
- 2024版文化创意产业项目投资与合作协议
- (完整版)信号与系统(吴大正)-完整版答案-纠错修改后版本
- 世界现代设计史简述
- 克雷洛夫寓言中的狐狸和乌鸦好词好句读后感
- 浙江理工大学《城市经济学》2023-2024学年第一学期期末试卷
- (完整版)100道凑十法练习题
- 光伏逆变器一课件
- 2023年上海师范大学辅导员招聘考试笔试题库及答案解析
- (完整版)英语高频词汇800词
- 严重精神障碍患者发病报告卡
- 《基础马来语》课程标准(高职)
- 2021年国标热镀锌钢管规格、尺寸理论重量表
- 乌鲁木齐基准地价修正体系
- DB32-T 3177-2017草莓-蕹菜水旱轮作设施栽培技术规程 -(高清现行)
- GB∕T 3216-2016 回转动力泵 水力性能验收试验 1级、2级和3级
- 七年级数学资料培优汇总精华
评论
0/150
提交评论