版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Spring20026.823ComputerSystemArchitectureDatapathforDLXSpring2002ProblemSet#2Studentsareallowedtocollaborateingroupsofupto3people.Agrouphandsinonlyonecopyofthesolutiontoaproblemset.Homeworkassignmentsaredueatthebeginningofclassonthesuedate.Tofacilitategrading,eachproblemmustbestapledseparately.Homeworkwillnotbeacceptedoncesolutionsarehandedout.Problem1:MicroprogrammingandBus-BasedArchitecturesProblem1.AHowmanycyclesdoesittaketoexecutethefollowinginstructionsinthemicrocodedDLXmachine?UsethestatesandcontrolpointsfromDLX-Controller-2andassumeMemorywillnotassertitsbusysingal.lilStTLlCtlOUCyclesADDR3,R2,R1ADDIR2,R1,#4LW Rl,0(R2)BEGSRlflabel#(R1==0)BEG3Rl,label#(R1!=0)J labelJR R1JALlabelJALRR1Whichinstructiontakesthemostcyclestoexecute?Whichinstructiontakesthefewestcyclestoexecute?Problem1.BBenBitdiddleneedstocomputefactorialsforsmallnumbers.RealizingthereisnomultiplyinstructioninthemicrocodedDLXmachine,heusesthefollowingcodetocalculatethefactorialofanunsignednumbern.result.=1;for(iresult.=1;for(i=0;i<n; {二wrr一匸for(jresult;=0;jresult<i;j-+)ten'::';ThevariablesI,j,n,temp,andresultareunsigned32-bitvalues.WritetheDLXassemblythatimplementsBen'factorialcode.UseonlytheDLZinstructionsthatcanbeexecutedonthemicrocodedDLXmachine(ALU,ALUi,LW,SW,J,JAL,JR,JALR,BENQ,andBENS).ThemicrocodedDLXmachinedoesnothavetobepreserved.HowmanyDLXinstructionsareexecutedtocalculateafactorial?Howmanycyclesdoesittaketocalculateafactorial?Again,usethestatesandcontrolpointsfromDLX-Controller-2andassumeMemorywillnotassertitsbusysignal.Fh匚tenalCycles013NProblem1.CAlyssaP.HackertellsBenthathisfactorialcodewillrunmuchfasterifheimplementsanunsignedmultiplyinstructioninthemicrocodedDLXmachine.ThenBencanreplacetheinnerloopinstructionswiththenewmultiplyinstruction.ThedetailsofAlyssa'snewproposedunsignedmultiplyinstructionare:▼Rsl(Rs▼Rsl(Rs2times)ThevalueofRs1isaddedRs2timesandtheresultstoredintoRd.Rs1andRs2aretreatedasunsigned32-bitvalues.IfRs2orRs1is0,thentheresultofRdwillalsobe0.TheformatoftheMULUinstructionisR-type.InordertobeabletowritemicrocodeforMULU,Alyssaaddsanadditionalregister,TO(33),totheregisterfile.Thisregister,likethePCregister,isnotvisibletotheprogrammer.Shealsoadds33asaninputtotheregisterfilemultiplexer.UsingWorksheet1,writemicrocodetoimplementAlyssa 'snewunsignedmultiplyinstruction.InWorksheet1,therepresentationofthenextstateisdifferentfromwhatwaspresentedinlecture.Thelasttwocolumnsrepresentsa2-bitfieldwithfourpossiblevalues:N,J,Z,andD.IUBrisN(next),thenthenextstateissimply(currentstate=1).IfitisJ(jump),thenthenextstateisunconditionallythestatespecifiedintheNextStatecolumn(i.e.,it uhcontiitionalmicrobranch).IfitisZ(branch-if-zero),thenthenextstatedependsonthevalueoftheAL'zerooutputsignal(i.e.,it'aconditionalmicrobranch).Ifzeroisasserted(==1),thenthenextstateisthatspecifiedintheNextStatecolumn,otherwise,itis(currentstate+1).UBisD(dispatch),thentheFSMlooksattheopcodeandfunctionfieldsintheIRandgoesintothecorrespondingstate.Forthisproblemset,weassumethatthedispatchgoestothelabeled(DLX-instruction-name+”.Forexample,iftheinstructionintheIRisSW,thenthedispatchwillgotostateSW0.TheALUperformsoperationsspecifiedbytheALUOp,whichisdeterminedbytheALUcontrollogicblock.AssumetheALUcanperformthefollowingoperations:
ALUOpALUResultOutputCOPYAACOPYBBrINCA1A+lDECA1A-l1INCA4A+4DECA4A-4ADDA+BSUBA-BHowmanycyclesdoesitfortheMULUinstructionfordifferentvaluesofRs2?Rs2Cycles013NProblem1.DWithAlyssa'snewunsignedmultiplyinstruction,Beneliminatestheinnerloopofhisoriginalfactorialcodeandsimplifiesittothefollowing.result=1;for(i=1;1<=n;i++}{result-result*1;}HelpBenwriteDLXassemblycodetoimplementfactorialusingthenewMULUinstruction.Again,useR1fornandR2forresult.Attheendofyourcode,R2mustcontainthecorrectvalue.Youdonothavetopreservethevaluesofanyotherregisters.HowmanyDLXinstructionsareexecutedtocalculateafactorial?Howmanycyclesdoesittaketocalculateafactorial?Again,assumeMemorywillnotassertitsbusysignal.FactorialCycles0123NProblem1.ECombiningamicrocontrollerandtheDLXbus-baseddatapath(L4-5)givesusacompleteworkingcomputerthatcanrunasubsetoftheDLXISA.Besidesrequiringmuchmorememory,AlyssatellsBenanotherreasonwhyusingtheoriginalDLXMicrocontroller(L4-9)wasavadidea.AmachinewiththeoriginalcontrollerwouldhaveamuchIongercycletimethanamachineusingthesecondmicrocontroller(L4-15).Benca'understandwhythisistrue.BelowarethedelaysofthehardwarepartsusedtoimplementtheDLXbus-baseddatapachandthefirstversionoftheDLXMicrocontroller(L4-9).Lehi?-紀mpmiwofregistersIR.A:B.N1A,pPCte-q-clock-to-qtmieofregistersIR,AB,MA.pPCtAiu一timeALUtakestogenerateresultmidzerot„ceud-delayofasignextendertgcc—delayofamultiplexertt-,—delayfromwhendataisrittentobustowhenitbecomesstablet亦為花—delayrofarn-UatebufferCrf-delayoftheregisterfileMm-delayofmemory-trom-delayoftheROMtALV_c0EtioL-delayforALUControlAssumethattALU,tROM,tMEN,havecomparablevaluesandthatthesedelaysarebiggerthanthedelaysforothercomponents.Usingthefirstversionofthemicrocontroller,whichmicrocodeinstructioninvokesthecriticalpathofthemachine?Describethecriticalpath.WhatistheminimumclockperiodthatthecompleteDLXbus-basedmachinecanrunat?AssumeMemorywillnotassertitsbusysignal.Problem2:PipelineHackingInspiredbyhissuccesswiththeMACCinstructioninthelastproblemset,BenBitdiddlecomesupwiththefollowingnewinstructionformatcalledBIF(forBen'InstructionFormat)thathewantstoaddtotheDLXISA:opcodelrf1rf2opcode2rf4rf565 5 65 5Thesemanticsofthenewinstructionwouldbethis:rf5<-(=floplWhereop1andop2areALUoperations.Forexample,ADDR1,R2:ADDR3,R4wouldbecomputedasfollows:RT<-(R1+R2)+R3Toimplementthenewinstructionformat,BendecidestoaddanALUtothememoryphaseofthepipelined,fully-bypassedimplementationoftheDLXdatapathdiscussedinlecture.Old-styleDLXinstructionswouldstillusetheALUintheexecutephasewhileBIFinstructionswouldusebothALUs.ThefirstALUinthememorypahse.Thenewpipelinewouldlooklikethis:IFIDEXMAWBFetchphaseDecodeandregisterfetchphaseExecutephase withonfilialALUMemoryphase withnewALUWrite-backphaseInaddition,theregisterfileintheolddatapathisreplacedwitharegisterfilewiththreereadportsandonewriteportsothatallthreeoperandscanbereadatthesametime.Problem2.AGiveacodeexamplethatshowshowtoycangetbetterperformaneeusingBIFinstructions.Provideboththeoriginalols-styleDLX,codeandthecodethatusestheBIFinstructions.Theoriginalcodeshouldcontainatleastsixinstructions.WhatisthemaximumpossibleimprovementinperformaneeusingBIFinstructions?Problem2.BShowalldatahazardsthatcancausestallsandprovideacodeexampleforeachcase.Youshouldconsiderbothold-styleDLXinstructionsandinstructionsusingBensnewformat.Youmayassumethatthedatapathisfully-bypassed.Donotconsiderjumpsorbranchesfornow.Stillignoringjumpsandbranches,howperformaneechangeifnon-BIFinstructionsusedthenewALUintheMAphaseinstasdoftheoriginalALUintheexecutephase?Problem2.CWritetheequationsforws,we,re1,re2forthenewdatapathwithnon-BIFinstructionsusingtheoriginalALUintheexecutephase.Writethestallsignalusingws,we,re1,andre2.Youmayneedothersignals.Thesignalsfortheoriginaldatapathareprovidedhereforyourconvenience.Again,donotconsiderjumpsorbranchesfornow.ws二caseopcodeALU => rf3ALUi’LW刁rf2JAL.JALRnR31we二caseopcodeALUtALUi,LW (wsRO)JALrJALR ron… noffrel二caseopcodeALU.ALUi„LW.SW.BEQZ,J尺JALRnonJ.JAL 0offre2=caseopcodeALU.SW =>on( (offcstallstall=(rfID=wsE)((opcodeE=LWE)((wsE(R0)(relD+(rf2D=w5E)((opcodeE=LWE)((w^E(R0)(re2DProblem2.DNowconsiderjumpsandbranches.Whatadditionalhazardscanoccur?Giveanexampleforeachcase.Withthenewinstructionformat,Benthinksthatwecanspeedupconditionalbranchesifweallowaninstructionthatcombinesthecomparewiththebranch.Forexample,5LTR1.K21 labelwouldmean:il(R1<R2)thenbranchtoLabelThefirstinstruction(inthisexample,theSLTinstruction)wouldbeperformedontheSLUintheEXpahse.Theresult(a0or1),insteadofbeingwrittentotheregisterfile,wouldbepassedtotheMAphasewherethezerotestwouldbeperformedonthenewALU.Howmanydelayslotswillthistypeofinstructionrequiretoavoidanystalls?精品文档Howcouldyoureducethenumberofdelayslotsthatareneeded,withoutintroducinganynewstallconditionsorkillinginstructionsinthepipeline?Foreachcasethatyouconsider,arguewhateffectitwillhaveontheclockperiod.Eachcaseshouldcorrespondtoanimplementationwithadifferentnumberofdelayslots.Giventheoptionsyouinvestigatedabove,argueinafewsentenceswhichoftheseoptionsisthebest.Considerdelayslots,stalls,circuitsize,andclockperiod.Problem2.EBenfindsthattheadditionalreadportintheregisterfileisincreasingthelengthofthecriticalpathintheprocessor,andthattheycannotclockthenewdatapathatashighaspeedastheoriginal.Totryandsolvetheproblem,heisgoingtotryandusetheoriginalregisterfile.Sincetheoriginalregisterfileonlyhastworeadports,onlytwooftheoperandscanbereadintheIDphase.ThethirdoperandisgoingtobereadintheEXphase,inparallelwiththefirstALUoperation.Whatotherchangesareneededtomakethisschemework?Howdothesechangesaffecttheperformaneeoftheprocessor?AlyssathinksthatBencansolvehisproblembyaddingasecondregisterfile,identicaltothefirst.Howwouldthisschemework?HowdoestheperformanceofAly'aBchemecompatrtothetwothatBentried?Problem2.FWiththenewBIFinstructions,howwillcodesizechange?Willthishaveaneffectonperformanee?Problem3:CacheAccess-Time&PerformanceBenistryingtodeterminethebestcacheconfigurationforanewprocessor.Heknowshowtobuildthreekindsofcaches:direct-mappedcaches,2-wayser-associativecaches,andsmallfullyassociativecaches.Thegoalistofindthebestcacheconfigurationwiththegivenbuildingblocks.Problem3.ASinceheonlyknowshowtobuildverysmallfullyassociativecaches,Bendecidedtouseeitherdirect-mappedor2-wayset-associativeasthebasiccacheconfiguration.Hewantstoknowhowthesetwodifferentconfigurationsaffecttheclockspeedandthecachemiss-rate,andchoosetheonethatprovidesbetterperformaneeintermsofaveragelatencyforaload.Problem3.AAccesstimeiDMThefollowingdiagramshowshowadirect-mappedcacheisorganized.Toreadawordfromthecache,theinputaddressissetbytheprocessor.Thentheindexportionoftheaddressisdecodedtoaccesstheproperrowinthetagmemoryarrayandinthedatamemoryarray.Theselectedtagiscomparedtothetagportionoftheinputaddresstodetermineiftheaccessisahitornot.Atthesametime,thecorrespondingcacheblockisreadandtheproperlineidselectedthroughaMUX.Inthememoryarray,eachrowcorrespondstoarowinthecache.Forexample,arowinthetagmemoryarraycontainsonetagandtwostatusbits(validanddirty)forthecacheline.Fordirect-mappedcaches,arowinthedataarrayholdsonecacheline.Nowwewanttocomputetheaccesstimeofthecache.Assumea32-KBcachewith8-word(32-byte)cachelines.Theaddressis32bits,andtwoLSBoftheaddressisignoredsinceacacheaccessisword-aligned.Thedataoutputisalso32bits,andtheMUXselectsonewordoutoftheeightwordsinacacheline.UsingthhedelayequationsgiveninTable3-1,fillinthecolumnforthedirect=mapped(DM)cacheinTable3-1.Intheequationforthedataoutputdriver,associativity”referstotheassociativityofthecache(1ordirect-mappedcaches,AforA-wayset-associativecaches).ComponentDelayequation(ps)DM(ps)SA(ps)Decoder200k(#ofindexbits)—1000TagDataMeniorsrarray200xlogj ofrows)-200xlog:侍ofbirsmarow)+1000TagDaraComparator200x(#oftagbits)+1000[N-to-1MUX500xlog2N+1000BufFerdriver2000D自祐outputdriver500<(as5ociari\iiv)+1000Validoutput1000Table3-1:DthyofeachCacheComponentWhatidthecriticalpathofthisdirect-mappedcacheforacacheread?Whatistheaccesstimeofthecache(thedelayofthecriticalpath)?Tocomputetheaccesstime,assumethatagate(AND,OR)delayis500(ps).IftheCPUclockis150MHz,howmanyCPUcyclesdoesacacheaccesstake?Problem3.B Accesstime:SATheimplementationofa2-wayset-associativecacheisshowninthefollowingdiagram.Theindexpartoftheinputaddressisagainusedtofindtheproperrowinthedatamemoryarrayandthetagmemoryarray.Inthiscase,however,eachrowcorrespondstotwocachelines(onecacheset).Arowinthedatamemoryholdstwocachelines(for32-bytescachelines,64bytes),andarowinthetagmemoryarraycontainstwotagsandstatusbitsforthosetags(2bitspercacheline).Thetagmemoryandthedatamemoryareaccessedinparallel,buttheoutputdatadriverisenabledonlyifthereisacachehit.Assumethetotalcachesizeis32-KB(eachwayis16-KB)andallotherparameters(suchastheinputaddress,cacheline,etc.)arethesameaspart3.A.Computethedelayofeachcomponentandfillinthecolumnfora2-wayset=associativecacheinTable3-1.Whatisthecriticalpathofthe2-wayset-associativecache?Whatistheaccesstimeofthecache(thedelayofthecriticalpath)?Whatisthemainreasonthatthe2-wayset-associativecacheisslowerthanthedirect-mappedcache?IftheCPUclockis150MHz,howmanyCPUxyxlesdoesacacheaccesstake?
InputAddressProblem3.CMiss-rateanalysisInputAddressProblem3.CNowBenisstudyingtheeffectofset-associativityonthecacheperformanee.Sincehenowknowstheaccesstimeofeachconfiguration,hewantstoknowthemiss=rateofeachone.Forthemiss-rateanalysis,Benisconsideringtwosmallcaches:adirect=mappedcachewith4lineswith16bytes/line,anda2-waqyset-associativecache,usingaleastrecentlyusedreplacementpolicy,with4lineswith16bytes/line.Benteststhecachebyaccessingthefollowingsequeneeifhexadecimalbyteaddress,startingwithemptycaches.Completethefollowingtablesforboththedirect-mappedcacheandthe2-wayset-associativecacheshowingtheprogressionofcachecontainsasaccessesoccur(inthetable,my”=
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 财务管理标准推行计划
- 娱乐休闲行业前台服务心得
- 互联服务销售工作总结
- 电商仓库管理员服务职责
- 纺织原料采购工作总结
- 语言学校前台工作总结
- 水产加工厂保安工作总结
- 第二单元 一年级下教案
- 2023年四川省德阳市公开招聘警务辅助人员辅警笔试自考题2卷含答案
- 2022年江苏省宿迁市公开招聘警务辅助人员辅警笔试自考题1卷含答案
- 国内外天然植物染料的应用及发展现状
- 安徽省马鞍山市2023-2024学年高一上学期期末考试物理试题(含答案解析)
- 心理健康对学生学习成绩的影响
- 食品生产企业员工食品安全培训
- 小学数学综合素质评价专项方案
- 模型预测控制现状与挑战
- 闽教版2023版3-6年级全8册英语单词表
- MOOC创新创业与管理基础(东南大学)
- 华为财务分析报告
- 快速出具旧机动车评估报告
- 人员保有培训课件
评论
0/150
提交评论