




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1Pipelining:BasicandIntermediateConcepts
IntroductionTheMajorHurdleofPipelining
PipelineHazardsImplementationMulti-cycleOperationsTheMIPSR4000PipelineCrosscuttingIssues
ScoreboardingConclusionCDA5155–Spring2012
Copyright©2012PrabhatMishra2IntroductionInearlyCPUs,deepcombinationallogicnetworkswereusedbetweenstateupdates.Signaldelaysmayvarywidelyacrosspaths.Newinputcannotbeprovidedtothenetworkuntiltheslowestpathshavefinished.Slowclockspeed,slowprocessingrates.LogicGate3IntroductionInpipelineddesign,logicnetworksaredividedintoshallowslices(pipelinestages).Delaysthroughthenetworkaremadeuniform.Anewinputcanbeprovidedtoeachsliceassoonasitsquick,shallownetworkhasfinished.Clockcycleisonlyaslongasthesloweststage.4PipeliningPipeliningisanimplementationtechniquewherebymultipleinstructionsareoverlappedinexecutionTakesadvantageofparallelismthatexistsamongtheactionsneededtoexecuteaninstruction
fetch
instructionfrommemory
decode
tofigureoutwhattodo
read
sourceoperands
execute
write
results5SimpleRISCDatapathIFIDEXMEMWBProgram
CounterNextPCInst.
Reg.Load
fr.Mem.
Data6BasicRISCPipeliningBasicidea:Eachinstructionspends1clockcycleineachofthe5executionstages.During1clockcycle,thepipelinecanprocess(indifferentstages)5differentinstructions.7AlternativeVisualization8VisualizationwithPipelineRegisters9LimitsofPipeliningIncreasingthenumberofpipelinestagesinagivenlogicblockbyafactorofn
Generallyallowsincreasingclockspeedandthroughputbyafactorofalmostn.Usuallylessthann
becauseofoverheadssuchaslatchesandbalanceofdelayineachstage.But,pipelininghasanaturallimit:Atleast1layeroflogicgatesperpipelinestage.Practicalminimumisusuallyseveralgates(2-10).Commercialdesignsarerapidlynearingthispoint.10FewRelatedTermsClockPeriod
=Max{timedelayofastage}1k+otherdelay(e.g.,skew,latchdelay)FrequencyReciprocaloftheclockperiodSpeedupkstagepipeline,ninstructions
kwhenn>>k.EfficiencyRatioofitsactualspeeduptotheidealspeedupThroughputNumberofinstructionsthatcanbecompletedperunittime.11BeyondPipeliningTherearesomeproblemswithclockedlogicHighpowerdissipationforclocksignaldistributionthroughoutchipupto40%ormoreoftotalpowerClocksignaltimingdifferences(clockskew)betweendifferentportionsofchipCanbesignificant,interferewithproperexecutionEachclockcyclecanonlybeasfastastheworst-casedelayoverallpipelinestagesintheentiredesign.Manyresultsendupwaitingwhentheydon’tneedto.AnalternativeSelf-timedlogic,Asynchronouslogic,DataflowcircuitsEachlogicblockusesexplicit“handshaking”signalsGloballyasynchronouslocallysynchronous(GALS)12Outline
IntroductionTheMajorHurdleofPipelining
ImplementationMulti-cycleOperationsTheMIPSR4000PipelineCrosscuttingIssuesConclusion13PipelineHazardsHazardsarecircumstanceswhichmayleadtostalls(delays,“bubbles”)inthepipelineifnotaddressed.Threemajortypes:Structuralhazards:NotenoughHWresourcestokeepallinstrs.moving.DatahazardsDataresultsofearlierinstructionsnotavailableyet.ControlhazardsControldecisionsresultingfromearlierinstr.(branches)notyetmade;don’tknowwhichnewinstr.toexecute.14StructuralHazardExampleTheprocessorhasacombinedinstruction+datamemorywithonly1readport15HazardsProduce“Bubbles”16TextualViewApipelinestalledforastructuralhazard–aloadwithonlyonememoryport17ThreeTypesofDataHazardsLeti
beanearlierinstruction,j
alaterone.RAW(readafterwrite)j
triestoreadavaluebeforei
writesitWAW(writeafterwrite)i
andj
writetosameplace,butinthewrongorder.Onlyoccursifmorethan1pipelinestagecanwriteWAR(writeafterread)j
writesanewvaluetoalocationbeforei
hasreadtheoldone.Onlyoccursifwritescanhappenbeforereadsinpipeline(in-order).18DataHazardExample19ForwardingforDataHazards20AnotherForwardingExample21AnUnavoidableStall22Stallinginmidstofinstruction23DataHazardPreventionAclevercompilercanoftenrescheduleinstructionstoavoidastall.Asimpleexample:Originalcode:
lwr2,0(r4)
addr1,r2,r3Stallhappenshere.
lwr5,4(r4)Transformedcode:
lwr2,0(r4)
lwr5,4(r4)
addr1,r2,r3Nostallneeded.
24SimpleRISCPipelineStallStatistics%ofloadsthatcauseastall25Control(Branch)HazardsSupposethenewPCvalueisnotcomputeduntiltheMEMstage.Thenwemuststall3clocksaftereverybranch!26PerformanceofPipelineswithStallsSpeedupAvg.inst.time(unpipelined)/
Avg.inst.time(pipelined)…CPIunpipelined/
CPIpipelined…PipelineDepth/(1+Pipelinestallcyclesperinstruction)PipelineDepth/(1+BranchfrequencyxBranchpenalty)27DelayedBranchesMachinecodesequence: Branchinstruction Delayslotinstruction(s) Post-branchinstructionsBranchistaken
(iftaken)atthispoint28SchedulingtheBranch-DelaySlot29Outline
IntroductionTheMajorHurdleofPipeliningImplementation
Multi-cycleOperationsTheMIPSR4000PipelineCrosscuttingIssuesConclusion30SimpleRISCDatapathIFIDEXMEMWBProgram
CounterNextPCInst.
Reg.Load
fr.Mem.
Data31DescriptionofPipeStages32DataHazardDetection33HazardDetectionLogicExample:Detectingwhetheraninstructionthathasjustbeenfetchedneedstobestalledbecauseofaprecedingload.ID/EX.IR[rt]==IF/ID.IR[rs]ID/EX.IR[rt]==IF/ID.IR[rt]ID/EX.IR[rt]==IF/ID.IR[rs]34ForwardingSituationsinDLX35ImplementingForwardinginHW36EarlyBranchResolution37OriginalRISCDatapathIFIDEXMEMWBProgram
CounterNextPCInst.
Reg.Load
fr.Mem.
Data38NewPipelineLogic39ControlInstructionStatistics40StatisticsonTakenBranches41Predict-Not-Taken42StaticBranchPredictionEarlierwediscussedpredict-taken,predict-not-takenstaticpredictionstrategiesApplieduniformlyacrossallbranchesinprogramStaticanalysisincompilermaybeabletodobetter,ifitcannon-uniformlypredictwhethereachspecificbranchislikelytobetakenOneway:backwardstaken,forwardsnottaken.Ifwecandobetter,itcanhelpwithstaticcodeschedulingtoreducedatahazardstalls…Alsomayassistlaterdynamicprediction43PredictionHelpsStaticSchedulingLDR1,0(R2)DSUBUR1,R1,R3BEQZR1,elseOR R4,R5,R6DADDUR10,R4,E3J afterelse:DADDUR7,R8,R9…after:PotentialloaddelaytofillIf-then-else
controlflowCode
movements
toconsider:SomedatadependencesWhichwaywillthis
branchgo?If
caseElse
case44SomeStaticPredictionSchemesAlwayspredicttaken34%mispredictrateonSPEC(range9%-54%)Backwardspredicttaken,forwardsnottakenInSPEC,morethan½offorwardsaretaken!Thisdoesworsethan“alwayspredicttaken”strategyUsu.notbetterthan30-40%mispredictionrateBetterthaneither:UseprofileinformationCollectstatisticsonearlierprogramruns.Workswellbecauseindividualbranchestendtobestronglybiased(takenornot)givenaveragedataBiasremainsstableacrossmultipleruns45Profile-BasedPredictorStatisticsFloating-Point46Predict-Takenvs.Profile-BasedFloating-pointInstructionsbetweenmis-predictions47TypesofExceptions(Interrupts,Faults)I/Odevicerequest,timereventInvokingOSservicesfromauserprogramTracing(single-stepping)throughprogramBreakpointsIntegerarithmeticoverflow,dividebyzeroFParithmeticanomaly(overflow,underflow,
,NaN,etc.)Pagefault(pagenotinphysicalmemory)MisalignedmemoryaccessMemory-protectionviolation(acc.mem.notalloc’edtoproc.)Illegal(undefinedorunimplemented)instructionHardwaremalfunctionPower-relatedinterrupt(e.g.batterylow,powerfailure)48TerminologyAcrossArchitectures49ExceptionCategorizationsSynchronousvs.asynchronousEventsynchronizedwithprogramexecution?Userrequestedvs.coercedEventcausedintentionallybyuserprogram?Usermaskable(canbedisabled)ornotCaneventbedisabled?WithininstructionsorbetweeninstructionsDoeseventpreventinstructionfromcompleting?ResumevsterminateDoestheprogramcontinuefromwhereitleftoffafterexceptionishandled,ordoesitstop?50RestartableExceptionsRequirements:Exceptionmayoccurwithininstruction.Programmustcontinueafterexceptionishandled.Examples:Virtualmemorypagefault.Difficultbecause:Pipelinestatemustbesaved.Oneapproach,foreasycases:1.Forceatrapinst.intopipelineonnextIF.2.Clearpipelinebehindfaultinginstruction.3.ExceptionhandlersavesPCoffaultinginstr.51Precisevs.ImpreciseHandlingMachinesmaysupporteitherorbothmodesofexceptionhandling:Preciseexceptionhandling:Correctlyimplementallpossiblecombinationsofexceptionsinallcircumstances.Maybearequirementforsomesystems/applications.Maybe10xslower!Easierforintegerthanfloating-point.Usefulfordebuggingcode.Impreciseexceptionhandling:Onlycorrectlyimplementthemostcommoncases.Softwaremayavoidsomeexceptions.Onlystatisticalguaranteeofcorrectness,throughtesting.52ExceptionsinMIPSpipelineInstructionFetch,&MemorystagesPagefaultoninstruction/datafetchMisalignedmemoryaccessMemory-protectionviolationInstructionDecodestageUndefined/illegalopcodeExecutionstageArithmeticexceptionWrite-BackstageNone!53Out-of-OrderExceptionsConsiderthefollowingcodesequence:LWIFIDEXMEMWBADDIFIDEXMEMWBTheADDmaycauseanexceptionduringIF,beforeLWcausesanexceptionduringMEM!Can’trestartPContheADD!Solution:Notetheexceptioninastatusvector,carriedalong.Disablewritesforthatinstruction.Resolveallexceptionsatalatestage(e.g.WB).54Outline
IntroductionTheMajorHurdleofPipeliningImplementationMulti-cycleOperations
TheMIPSR4000PipelineCrosscuttingIssuesConclusion55Multi-cycleOperationsforFP56PipelinedMultiple-IssueFPU57FPUPipeliningIssuesinDLXNoticeinstructionsmaycompleteout-of-order:MULTDIFIDM1M2M3M4M5M6M7MEWBADDDIFIDA1A2A3A4MEWBLDIFIDEX
MEWBSDIFIDEX
MEWBRaisesthepossibilityofWAWhazards,andstructuralhazardsinMEM&WBstages.Structuralhazardsmayoccurespeciallyoftenwithnon-pipelinedDIVunit.Out-of-ordercompletionimpactsexceptionhandling.58TypicalFPCodeSeq.withStallsMUL.DstallsinID1cyclewaitingfornewvalueofF4fromMEMstageofL.DADD.Dstalls1cycleinIFwaitingforMUL.DtoleaveID,then6cyclesinIDwaitingfornewF0tobereturnedbyMUL.DstageM7.S.Dstalls6cyclesinIFwaitingforADD.DtoleaveID,then2cyclesinEXwaitingfornewF2tobereturnedbyADD.DstageA4,then1morecycleinEXwaitingforADD.DtoclearMEMstage.ClockCycleNumberInstruction1234567891011121314151617L.DF4,0(R2)IFIDEXMEWBMUL.DF0,F4,F6IFIDstallM1M2M3M4M5M6M7MEWBADD.DF2,F0,F8IFstallIDstallstallstallstallstallstallA1A2A3A4MEWBS.DF2,0(R2)IFstallstallstallstallstallstallIDEXstallstallstallME59IssuesinMulti-CycleOperationsStallforRAWislongerandmorefrequent(Fig.A33)WAWispossible;WARisnot(why?)StructuralHazardpossiblefornon-pipelinedunitMultipleWBsarelikely(Fig.A.34)HandlinghazardsatIssue(ID)stage:Checkstructuralhazards:functionalunit,WBportCheckRAWhazards:IssuewithforwardingCheckWAWhazards:NotissuetomakesurewriteinorderDetectandstallinstructionbeforeMEMandWBstages60FPStallStatisticsperFPoperation61ISADesignImpactsPipeliningVariableinstructionlengths&runtimes:Introducesdelaysduetopipelineinequities.Complicateshazard-detection&preciseexceptionsSophisticatedaddressingmodes:Post-autoincrementcomplicateshazarddetection,restarting,introducesWAR&WAWhazards.Multiple-indirectmodescomplicatepipelinecontrol&timing.Self-modifyingcode:Whatifyouoverwriteaninstructioninthepipe?Implicitconditioncodes:WARhazards,restarts62Outline
IntroductionTheMajorHurdleofPipeliningImplementationMulti-cycleOperationsTheMIPSR4000Pipeline
CrosscuttingIssuesConclusion63RealMIPSR4000“SuperPipeline”IF,IS-Instructioncachefetch,First&Secondhalves.RF-Inst.decode,RegisterFetch,hazardcheck…EX-Execution(EAcalc,ALUop,targetcalc…)DF,DS-Datacacheaccess,First&Secondhalves.TC-TagCheck,didcacheaccesshit?WB-Write-Backforloads®ister-registerops.64R4000:Two-CycleLoadDelay65R4000:Three-CycleBranchDelay66R4000FPFunctionalUnitStagesU–Unpackfloating-pointnumbersFPadderfunctionalunitstages:A–MantissaADDstageR–RoundingstageS–OperandshiftstageFPmultiplierfunctionalunitstages:E–ExceptionteststageM–FirststageofmultiplierN–SecondstageofmultiplierFPdividerfunctionunitstages:D–Dividepipelinestage67LatencyandInitiationIntervalFPInstructionLatencyInitiation
intervalMIPSR4000PipestagesAdd,subtract43U,S+A,A+R,R+SMultiply84U,E+M,M,M,M,N,N+A,RDivide3635U,A,R,D27,D+A,D+R,D+A,D+R,A,RSquareroot112111U,E,(A+R)108,A,RNegate21U,SAbsolutevalue21U,SFPcompare32U,A,RU–unpackA–mantissaaddR–roundS–shiftE–exceptiontestM–multiply1ststageN–multiply2ndstageD–divideBothunitsusedinsameclockcyclePairofunitsusedon
108consecutivecycles68FPMultiplyfollowedbyAddClockcycleOpIssue/
Stall0123456789101112mulIssueUEMMMMNNARaddIssueUSAARRSIssueUSAARRSIssueUSAARRSStallUSAARRSStallUSAARRSIssueUSAARRSIssueUSAARRS69TheMIPSR4300PipelineManufacturedbyNEC64-bitprocessorimplementsMIPS64ISAUsedinembeddedapplicationsNintendo-64gameprocessor,networkrouter,…MultipleEXstagesforfloating-pointpipelineOut-of-ordercompletion,preciseexceptionsNECVR4122:Integerdatapath,softwareforFPoperations70Outline
IntroductionTheMajorHurdleofPipeliningImplementationMulti-cycleOperationsTheMIPSR4000PipelineCrosscuttingIssues
Conclusion71RISCISAandEfficiencyofPipelinesSimpleinstructionsetEasiertoschedulecodetoimproveperformanceStaticschedulingbycompilerDynamicschedulingbyhardwareLeadstoout-of-orderexecution(completion)RequiresmechanismtoensurecorrectexecutionScoreboarding72ScoreboardingTechniqueforimplementinganinstructionqueuethatsupportsdynamicreordering.DevelopedonCDC6600(decadesago).ReorderingmustcheckWAR/WAWhazards:
DIVDF0,F2,F4
Long-running
ADDDF10,F0,F8
DependsonDIV
SUBDF8,F8,F14
Anti-dependsonADDGoal:Beginexecutionofinstructionsasearlyaspossible73Simplescoreboardeddatapath74PipelinewithScoreboarding0. (F)Fetchinstructionfromcacheorprefetchbuffer(I)Issueinst.toanexecutionpath(whennostructural/WAWhazards)(R)Readoperands(whennoRAWhazardsremain)(E)Executeinstruction(possiblymulti-cycle)(W)Writeresults(whennoWARhazardsremain)InstructionFetchInstructionIssuePre-issue
bufferExecutionunit1Executionunit2…WriteresultsPre-execution
buffersPost-execution
buffersRead
operandsRead
operandsScoreboard/
ControlUnitInstructionDecode751.InstructionIssue(IS)StageReceivenewly-fetchedinstructionDecodebinaryinstructionformatCheckforstructuralhazards:Instructionneedsexecutionunitcurrentlyinuse,whoseinitiationintervalhasn’tpassed?CheckforWAWhazards:Instructionwantstowritetoaregisterthatanactiveinstruction(issued,butnotyetfinished)wantstowriteto?Badiftheyfinishout-of-order!Stallallcurrent(&future)instructionissuing,untilnoneofthesehazardsremain.Issueinstructions(in-order)totheappropriateexecutionunits&trackstatusonscoreboard.Replacesfirsthalf
ofIDstage762.ReadOperands(RO)StageReceiveinstructionissuedtofunctionalunit.CheckforRAWhazards:Areallsourceoperandsavailableyet?Ifno:Holdinstructioninapre-executionbuffer.Ifbufferhasonly1entry,thisandallnot-yet-issuedinstructionsusingthisfunctionalunitmustwait.Ifyes:Readoperandsfromregisterfile,&startinstructiondowntheexecutionunit’spipeline.Replacessecondhalf
ofIDstage773.Execution(EX)StageOnceoperandsarereceived,beginexecutionoftheinstructionintheexecutionunit.Executionmaytakemultiplecycles.Whenresultisready,notifyscoreboardofinstructioncompletion.ReplacesoldEXstage784.WriteResult(WR)stageReceivecompletedinstruction&itsresultfromexecutionunit.CheckforWARhazards:Doesanypreviously-issuedinstructionthathasnotyetreaditsoperandsdependontheoldvalueweareabouttooverwrite?(Doesitanti-dependonus?)Whileyes:
Stallinstructioninapost-executionbuffer.Whenno:
Writeinstructionresulttoregisterfile.ReplacesWBstage79ScoreboardImplementationOnetypicalimplementationusesthreetables:Instructionstatus,foreachinstructiononthescoreboardWhichstageofexecutionistheinstructioncurrentlyin?FunctionalUnit(FU)status,foreachFU:Whatinstruction(ifany)isbeingprocessed?Ifinst.isinROstage,thenforeachoperand:Whatregisteristheoperandcomingfrom?Istheoperandready?Ifnot,whichFUwillproducetheoperand?Registerresultstatus,foreachreg.intheISA:Whichcurrently-runningFU(ifany)isscheduledtooverwritethegivenregister?80FunctionalUnitStatusTableForeachfunctionalunit,thefollowingfields:Busy–Istheunitbusy(Yes/No)?Op–WhichexactopcodetoperformintheFU?Fi–DestinationregisterofinstructionintheFUFj,Fk–SourceregistersofinstructionThesefieldsareonlyneededduringROstage:Qj,Qk–FUstowritenewvaluesofsourceregisters,or0Rj,Rk–AreoperandsFj,Fkready?(Yes/No)Registerresultstatustablehasonly1field:Result–Whichcurrently-executingFUwillwriteitsresulttothisregister?81#1:ScoreboardAfter2ndLD’sEX82#2:JustbeforeMULTD’sWRstage83#3:JustbeforeDIVD’sWRstage84ScoreboardingLogicInthetablebelow:FU=functionalunitusedbygiveninstructionD,S1,S2=giveninstr’sdestination&sourceregsop=operationtobeperformedResult[reg]=Registerresultstatustableentry
forregisteridentifiedbyreg(completedbyendofcycle)(true@start
ofcycle)(AvoidsWAW)(AvoidsWAR)(AvoidsRAW)(Releaseotherwriters
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 盘点与展望-解释中国媒介的经营改革走向
- 2025中外合作开发合同2
- 推动教育高质量发展的新阶段方案
- 2025大学生实习合同模板
- 秦皇岛工业职业技术学院《细胞生物学实验技术》2023-2024学年第二学期期末试卷
- 广东南方职业学院《科技翻译》2023-2024学年第一学期期末试卷
- 北京工商大学《影视片头设计》2023-2024学年第一学期期末试卷
- 武汉软件工程职业学院《高级俄语三》2023-2024学年第一学期期末试卷
- 克孜勒苏职业技术学院《医学细胞生物学B》2023-2024学年第二学期期末试卷
- 广西田阳高中2024-2025学年高三第一次诊断性历史试题含解析
- 电梯安全管理员考试题库
- 2024年4月自考00153质量管理(一)试题及答案
- 2025年山东省东营市2024-2025学年下学期九年级模拟一模数学试题(原卷版+解析版)
- 大坝固结灌浆与帷幕灌浆施工方案
- 交警道路交通安全执法规范化课件
- 人教五四 六年级 下册 语文 第五单元《中国有能力解决好吃饭问题 第二课时》课件
- 2025年湖北省八市高三(3月)联考物理试卷(含答案详解)
- 综合应急预案、专项应急预案、现场处置方案
- 放射医学检查技术及操作规范
- 《南非综合简要介绍》课件
- 新苏教版一年级数学下册第四单元《认识20~99》全部教案(共3课时)
评论
0/150
提交评论