




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、COMP231Query OptimizationPrepared by Raymond WongPresented by Raymond WongCOMP2311OutlineIntroductionMaterialization and On-the-flyJoin over Two TablesJoin over Multiple TablesCOMP2312IntroductionAfter users write SQL statements, They want to obtain the results quicklyE.g., student (sid, sname, age)
2、 take (sid, cid) select sname from student S, take T where S.sid = T.sid and T.cid = “231” and S.age 20However, there are many ways to obtain the resultsFind the name of all students with age greater than 20 who take 231. COMP2313SQL select sname from student S, take T where S.sid = T.sid and T.cid
3、= “231” and S.age 20Translate the query to relational algebra operations sname(cid=231age20(Take Take.sid=Student.sid Student)sname(cid=231age20(Take Take.sid=Student.sid Student)COMP2314Evaluation Plan 1Take TStudent ST.sid=S.sidcid=231 age 20snameEvaluation planNodes are relational operators or ta
4、blesEdges point to where the input comes fromsname(cid=231age20(Take Take.sid=Student.sid Student)COMP2315Evaluation Plan 2Take TStudent ST.sid=S.sidcid=231 snameage 20sname(cid=231age20(Take Take.sid=Student.sid Student)COMP2316Evaluation Plan 3Student STake TT.sid=S.sidcid=231 snameage 20sname(cid
5、=231age20(Take Take.sid=Student.sid Student)COMP2317Evaluation Plan 4Take TStudent Ssid=sidcid=231 snameage 20sname(cid=231age20(Take Take.sid=Student.sid Student)COMP2318DBMSs job is to optimize the users query byConverting the query to a number of evaluation plansEstimate the cost of each evaluati
6、on planFind the evaluation plan with the lowest costsname(cid=231age20(Take Take.sid=Student.sid Student)COMP2319OutlineIntroductionMaterialization and On-the-flyJoin over Two TablesJoin over Multiple TablesCOMP23110Materialization and On-the-flyLet op1 and op2 be two relational algebra operations,
7、and op1 is performed on the result of op2The evaluation of op1 is on-the-fly if the result of op2 is directly sent to op1 (i.e, not stored in a temporary file) It is materialized if that result is stored in a temporary file firstOn-the-fly evaluation is called pipelined evaluationOn-the-fly can be m
8、ore efficient than materialized sname(cid=231age20(Take Take.sid=Student.sid Student)COMP23111Materialization and On-the-flyMaterializationOn-the-flysname(cid=231age20(Take Take.sid=Student.sid Student)COMP23112Materializationsname(cid=231age20(Take Take.sid=Student.sid Student)Disksidsnameage100Ray
9、mond20200Peter21sidcid100231200231200170StudentTakeMemoryReadingsidsnameage100Raymond20200Peter21sidcid100231200231200170Joinsidsnameagecid100Raymond20231200Peter21231200Peter21170Writingsidsnameagecid100Raymond20231200Peter21231200Peter21170MaterializationCOMP23113Materializationsname(cid=231age20(
10、Take Take.sid=Student.sid Student)Disksidsnameage100Raymond20200Peter21sidcid100231200231200170StudentTakeMemorysidsnameagecid100Raymond20231200Peter21231200Peter21170COMP23114Materializationsname(cid=231age20(Take Take.sid=Student.sid Student)Disksidsnameage100Raymond20200Peter21sidcid1002312002312
11、00170StudentTakeMemorysidsnameagecid100Raymond20231200Peter21231200Peter21170Readingsidsnameagecid100Raymond20231200Peter21231200Peter21170sidsnameagecid200Peter21231snamePetercid=231age20snameOutput to the screenCOMP23115Materialization and On-the-flyMaterializationOn-the-flysname(cid=231age20(Take
12、 Take.sid=Student.sid Student)COMP23116On-the-flysname(cid=231age20(Take Take.sid=Student.sid Student)Disksidsnameage100Raymond20200Peter21sidcid100231200231200170StudentTakeMemoryReadingsidsnameage100Raymond20200Peter21sidcid100231200231200170Joinsidsnameagecid100Raymond20231200Peter21231200Peter21
13、170COMP23117On-the-flysname(cid=231age20(Take Take.sid=Student.sid Student)Disksidsnameage100Raymond20200Peter21sidcid100231200231200170StudentTakeMemorysidsnameagecid100Raymond20231200Peter21231200Peter21170sidsnameagecid200Peter21231snamePetercid=231age20snameOutput to the screenCOMP23118OutlineIn
14、troductionMaterialization and On-the-flyJoin over Two TablesJoin over Multiple TablesCOMP23119Example SchemaAssuming the following table sizes:Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tupleAssume that there are 200 courses in table TakeStudent: 50
15、0 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table StudentCOMP23120Example SchemaJoinSimple-nested loop JoinBlock-nested loop JoinSort-Merge JoinIndex-Nested loop JoinStudent: 500 pages, 80 tuples/p
16、age, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table StudentCOMP23121Evaluation Plan 1Take TStudent ST.sid=S.sidcid=231 age 20sname(File scan)(File scan)(On-the-fly)(On-the-fly)(Simple-NestedLoop Join)Student: 500 pages,
17、 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tupleCost = 500+500*1000 =500500 pagesWe do NOT calculate the cost of writing the output. Why?- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table StudentSuppose that Student is the outer relationCOMP23122Evaluati
18、on Plan 1Take TStudent ST.sid=S.sidcid=231 age 20sname(File scan)(File scan)(On-the-fly)(On-the-fly)(Simple-NestedLoop Join)Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tupleCost = 1000+1000*500 =501000 pages- 200 courses in table Take- 1, 2, , 40 in
19、attribute Ageof table StudentSuppose that Take is the outer relationIf Student is the outer relation,then the cost is 500500 pagesIf Take is the outer relation,then the cost is 501000 pagesWhich one should we choose?COMP23123Evaluation Plan 2Take TStudent ST.sid=S.sidcid=231 snameage 20(File scan)(F
20、ile scan)(Write to Temp T1)(Write to Temp T2)(Simple-NestedLoop Join)(On-the-fly)Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table StudentCOMP23124Evaluation Plan 2Take TStudent ST.sid=S
21、.sidcid=231 snameage 20(File scan)(File scan)(Write to Temp T1)(Write to Temp T2)(Simple-NestedLoop Join)(On-the-fly)Cost of Reading T = 1000 pagesStudent: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tupleSize of T1 = 1000/200 = 5 pages- 200 courses in table
22、Take- 1, 2, , 40 in attribute Ageof table StudentCost of Writing T1 = 5 pagesCost of Reading S = 500 pagesSize of T2 = 500/2 = 250 pagesSuppose that T1 is theouter relationCost of Join = 5 + 5*250= 1255 pagesTotal cost = 1000 + 5 + 500 + 250 + 1255 = 3010 pagesCost of Writing T2 = 250 pagesCOMP23125
23、Evaluation Plan 2Take TStudent ST.sid=S.sidcid=231 snameage 20(File scan)(File scan)(Write to Temp T1)(Write to Temp T2)(Simple-NestedLoop Join)(On-the-fly)Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in at
24、tribute Ageof table StudentCOMP23126Which evaluation plan is better?Plan 1 or Plan 2?Why?COMP23127Evaluation Plan 3Student STake TT.sid=S.sidcid=231 snameage 20Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 i
25、n attribute Ageof table Student(File Scan)(Write to temp T1)(File Scan)(Simple-NestedLoop Join)(On-the-fly)(On-the-fly)Similar derivations can be done.COMP23128Evaluation Plan 3Student STake TT.sid=S.sidcid=231 snameage 20Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples
26、/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table Student(File Scan)(Write to temp T1)(File Scan)(Simple-NestedLoop Join)(On-the-fly)(On-the-fly)COMP23129Evaluation Plan 4Take TStudent Ssid=sidcid=231 snameage 20Student: 500 pages, 80 tuples/page, 50 bytes/tupleTa
27、ke: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table Student(File Scan)(Write to temp T1)(File Scan)(Simple-NestedLoop Join)(On-the-fly)(On-the-fly)Similar derivations can be done.COMP23130Example SchemaJoinSimple-nested loop JoinBlock-neste
28、d loop JoinSort-Merge JoinIndex-Nested loop JoinStudent: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table StudentCOMP23131Evaluation Plan 2Take TStudent ST.sid=S.sidcid=231 snameage 20(File scan
29、)(File scan)(Write to Temp T1)(Write to Temp T2)(Sort-Merge Join)(On-the-fly)Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table StudentCost of Sorting = 2N * (# of passes)where no. of pas
30、ses=Size of the buffer = 3 pagesCOMP23132Evaluation Plan 2Take TStudent ST.sid=S.sidcid=231 snameage 20(File scan)(File scan)(Write to Temp T1)(Write to Temp T2)(Sort-Merge Join)(On-the-fly)Cost of Reading T = 1000 pagesStudent: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/p
31、age, 40 bytes/tupleSize of T1 = 1000/200 = 5 pages- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table StudentCost of Writing T1 = 5 pagesCost of Reading S = 500 pagesSize of T2 = 500/2 = 250 pagesTotal cost = 1000 + 5 + 500 + 250 + 20 + 4000 + 255 = 6030 pagesSize of the buffer = 3 page
32、sCost of Sorting T1 = 2*5*( )= 20 pagesCost of Sorting T2 = 2*250*( )= 4000 pagesCost of Merging = cost of reading T1 and T2= 5 + 250 = 255 pagesCost of Reading S = 500 pagesCost of Writing T2 = 250 pagesCOMP23133Evaluation Plan 3Take TStudent ST.sid=S.sidage 20 snamecid=231Student: 500 pages, 80 tu
33、ples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table Student(File Scan)(Write to temp T1)(Sort-MergeJoin)(On-the-fly)(On-the-fly)Size of the buffer = 3 pagesAssume that the tuples in Table Student are sorted accord
34、ing to sidCOMP23134Evaluation Plan 3Take TStudent ST.sid=S.sidage 20 snamecid=231Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table Student(File Scan)(Write to temp T1)(Sort-MergeJoin)(On
35、-the-fly)(On-the-fly)Size of the buffer = 3 pagesCost of Reading T = 1000 pagesSize of T1 = 1000/200 = 5 pagesCost of Writing T1 = 5 pagesTotal cost = 1000 + 5 + 20 + 505 = 1530 pagesCost of Sorting T1 = 2*5*( )= 20 pagesCost of Merging = Cost of Reading T1 and S= 5 + 500 = 505 pagesSince S is sorte
36、d according to sid,we do NOT need to sort SCOMP23135Example SchemaJoinSimple-nested loop JoinBlock-nested loop JoinSort-Merge JoinIndex-Nested loop JoinStudent: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attrib
37、ute Ageof table StudentSize of the buffer = 3 pagesCOMP23136Evaluation Plan 4Take TStudent Ssid=sidcid=231 snameage 20Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in attribute Ageof table Student(Use Hash I
38、ndex on cid; Do not write Result to temp)(Hash Index on sid)(Index Nested LoopJoin with Pipelining)(On-the-fly)(On-the-fly)Size of the buffer = 3 pagesAssume the following.T is sorted according to cidT has a clustered hash index on cidS has an unclustered hash index on sidAssume that the (average) c
39、ost of reading a hashindex is 1.2 pagesCOMP23137(Use Hash Index on cid; Do not write Result to temp)Evaluation Plan 4Take TStudent Ssid=sidcid=231 snameage 20Student: 500 pages, 80 tuples/page, 50 bytes/tupleTake: 1000 pages, 100 tuples/page, 40 bytes/tuple- 200 courses in table Take- 1, 2, , 40 in
40、attribute Ageof table Student(Hash Index on sid)(Index Nested LoopJoin with Pipelining)(On-the-fly)(On-the-fly)Size of the buffer = 3 pagesCost of Reading T with Hash Index = Cost of Reading Hash Index + Cost of Reading T containing tuples with cid = 231= 1.2 + 5 = 6.2 pagesSize of the temp result =
41、 1000/200 = 5 pagesTotal cost = 6.2 + 1100 = 1106.2 pagesCost of Index Nested Loop Join= Cost of Reading S with Hash Index= 500*(1.2+1)= 1100 pagesNo. of tuples in T with cid = 231 = 100*5 = 500 COMP23138OutlineIntroductionMaterialization and On-the-flyJoin over Two TablesJoin over Multiple TablesCOMP23139Join over Multiple TablesRelational Algebra EquivalencesLeft-Deep PlanCOMP23140Allow us to choose dif
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 苏教版小学数学六年级下册全教案
- 《2025跨国租赁合同的附件》
- 2025承诺担保合同
- 2025瓷砖铺设合同范本
- 音乐教育的魅力
- 引领创新 家居先锋
- 2025农民林地流转合同
- 《2025网络招标代理服务合同》
- 2025肉牛交易合同模板
- 家装工艺培训课件
- 2025年FRM金融风险管理师考试专业试卷(金融风险管理案例分析)
- 泥尾运输合同协议
- 低压电器 课件 单元三 项目三 任务一 掌握接触器联锁正反转控制线路
- 食堂食品追溯管理制度
- 北京市石景山区2025年高三统一练习(生物及答案)(石景山一模)
- 森林火灾风险评估-全面剖析
- 2024上海市招聘社区工作者考试题及参考答案
- 2024年河北省初中学业水平适应性测试生物学试卷
- 《蒋勋眼中的宋词》阅读练习及答案
- 全回转钻机在国内的发展与应用(课堂PPT)
- 自制A4纸田字格模板(可直接打印版).xls2014.9.14
评论
0/150
提交评论