版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
DatabaseSystem
Problem1
Thehashjoinalgorithmdescribedinthetextbookcomputesthenaturaljoinoftworelations.Writepseudocodeforaniteratorthatimplementsahashbasedalgorithmforcomputingan“anti-join”operation:Ranti-joinScontainsalltuplesinRthathavenomatchingtuplesinS.Althoughanti-joinisnotanoperationthatcanbespecifieddirectlyinSQL,itisimplementedinthequeryprocessortosupportsometypesofqueries.Yourpseudocodemustdefinethestandarditeratorfunctionsopen(),next(),andclose(),andmustfetchtheinputtuplesusingiteratorcalls(inotherwords,RandSmaynotbebaserelations,butmaybeoutputsofotheroperators).Showwhatstateinformationtheiteratormustmaintainbetweencalls.Forsimplicity,youcanassumethatSissmallandeasilyfitsinmemory.Herearemoredetailsonanti-join:
/wiki/Relational_algebra#Antijoin
Problem2
ConsideranindexorganizedasaB+tree.TheleafnodescontainpointerstoatotalofNrecords,andeachblockthatmakesuptheindexhasmpointers.Wewishtochoosethevalueofmthatwillminimizesearchtimesonaparticulardiskdevice.
Wearegiventhefollowinginformation:
(a)Forthediskthatwillholdtheindex,thetimetoreadagivenblockintomemorycanbeapproximatedby(25+.02*m)milliseconds.The25millisecondsrepresenttheseekandlatencycomponentsoftheread,the.02*mmillisecondsisthetransfertime.(Thatis,asmbecomeslarger,thelargertheblockwillbeandthemoretimeitwilltaketoreaditintomemory.)
(b)Oncetheblockisinmemoryabinarysearchisusedtofindtherightpointer.Sothetimetoprocessablockinmainmemoryis(a+blog_2m)milliseconds,forsomeconstantsa,b.
("log_2m"meanslogbase2ofm.)
(c)Themainmemorytimeconstant"a"ismuchsmallerthanthediskseekandlatencytimeof25milliseconds.
(d)Theindexisfull,sothatthenumberofblocksthatmustbe
examinedpersearchislog_mN.
Questions:
(i)Whatvalueofmminimizesthetimetosearchforagivenrecord?AnapproximateanswerisOK.(HINT:Ifyoucomeupwithanequationwhichishardtosolvealgebraically,tryplugginginvaluestolocatetherootoftheequation.)Thevalueyouobtainshouldbeindependent
of"b"!
(ii)Whathappensastheseekandlatencyconstant(25ms)decreases?Forinstance,ifthisconstantiscutinhalf,howdoestheoptimummvaluechange?
Answer:
Let
T1
bethetimetakentoreadoneblockintomainmemoryand
T2
bethetimetoprocessthatblockinmemory.If
n
blocksareexaminedtosearchforagivenrecord,thenthetimetakenforthissearchis:
n*(T1
+T2)
Substitutingtheappropriateexpressionsforallthevariables,weget
T(m)=[(25+0.02m)+(a+blog2m)]*logmN
Ignoring
a
incomparisonwith
30,weget
T(m)=[25+0.02m+blog2m]*logmN
Substituting
log2m
by
ln(m)/ln(2)
and
logmN
by
ln(N)/ln(m),weget
T(m)=[(25+0.02m)/ln(m)]*ln(N)+b*ln(N)/ln(2)
Hence,tominimizeT(m),weneedtominimize
f(m)=(25+0.02m)/ln(m).Anumberoftechniquescanbeappliedtodeterminethevalueof
m
thatminimizesthisexpression.Forexample,takingthederivativeandsettingittozero(f'(m)=0),weget
m*(ln(m)-1)=1250
Thiscanbesolvedtoyieldanintegralvalueof
272.Itiseasytoverifythatthisvaluedoesindeedminimize
f(m)
andhence
T(m).
Seek+latencyconstantdecreases:
Letussplit
T(m)
intothreecomponentsasfollows(here,
ts
istheseek+latencytimeconstantwhichwas30intheabovecase):
Totalseekandlatencytime
Ta
=ts
*ln(N)/ln(m)
Totaltransfertime
Tb
=0.02*m*ln(N)/ln(m)
Totalbinarysearchtime
Tc
=b*ln(N)/ln(2)
Ofthese,
Tc
isindependentof
m,
Ta
isadecreasingfunctionof
m
whereas
Tb
isanincreasingfunctionof
m.If
ts
isverysmall,then
Ta
isnegligiblecomparedto
Tb.Hence,wemustchooseasmallvalueof
m
tominimizethesearchtime.Ontheotherhand,if
Ta
isverylargecomparedto
Tb,thentheformerdominatesandwechoosealargevalueof
m
tominimizesearchtimes.Fromtheselimitingcases,itiseasytoseethatif
ts
decreases,theoptimumvalueof
m
decreases.Inparticular,if
ts
=25/2=12.5,wegetanoptimumvalueof
155
(byproceedingalongthesamelinesasabove).
Problem3
Tobuildahashindexforamulti-attributesearchkey,wecanuseanapproachcalledpartitionedhashing.Thepartitionedhashfunctionisreallyalistofhashfunctions,oneforeachattributeinthesearchkey.SupposethatwewishtobuildapartitionedhashindexonR(A,B)with2nbucketsnumbered0to2n1.Inthiscase,thepartitionedhashfunctionconsistsoftwohashfunctionshAandhB.HashfunctionhAtakesavalueofAasinputandproducesaresultwithnAbits,andhBtakesavalueofBasinputandproducesaresultwith(nnA)bits.Thetworesultsareconcatenatedtogethertoproducetheresultofthepartitionedhashfunction,whichisthenusedtoindexthebuckets.TolocaterecordswithA=aandB=b,wesimplygodirectlytothebucketnumberedhA(a)hB(b)(inbinary).
(a)WhichbucketsdowehavetoexamineinordertolocateallrecordswithA=a?
(b)Supposewearegivenaquerymix.EachqueryinthismixwilleitheraskforrecordswithagivenvalueofA,oritwillaskforrecordswithagivenvalueofB(butneverboth).Withprobabilityp,thevalueofAwillbespecified.Giveaformulaintermsofn,nA,andpfortheexpectednumberofbucketsthatmustbe
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024-2030年中国庭院经济行业市场现状及竞争格局与投资潜力研究报告
- 2024-2030年中国废物垃圾处理行业市场发展前瞻及投资战略研究报告
- 2024-2030年中国平面铣床行业市场发展趋势与前景展望战略分析报告
- 2024-2030年中国平头式耳机行业消费动态及竞争趋势预测研究报告
- 2024-2030年中国带式压滤机行业应用趋势及投资效益预测报告
- 2024-2030年中国工艺陶瓷行业市场深度调研及发展趋势与投资前景研究报告
- 2024-2030年中国工程建设行业市场发展分析及发展前景与投资战略研究报告
- 2024-2030年中国工业锅炉用煤行业销售态势及发展潜力预测报告
- 2024-2030年中国工业级硬酯酸钙行业运行动态与供需前景预测报告
- 2024-2030年中国工业服务行业市场发展趋势与前景展望战略分析报告
- 全国高中数学联赛加试平面几何汇编
- 《体育测量与评价》PPT课件.ppt
- 基层医疗卫生机构院长目标管理责任制绩效考核评分标准表
- 二年级语文下册 猜字谜(课堂PPT)
- 核心网设备安装工程质量控制点
- 推进创新券跨区域通用通兑的政策建议
- 新增旅游车可行性报告五篇范文
- 上市公司并购重组课件
- tf家族练习生招募报名表
- 多媒体展示系统施工组织设计方案
- 大医精诚原文及翻译
评论
0/150
提交评论