![计算机体系结构8_第1页](http://file3.renrendoc.com/fileroot_temp3/2022-3/28/a4891a70-888d-498c-9b8d-709fb85a23b8/a4891a70-888d-498c-9b8d-709fb85a23b81.gif)
![计算机体系结构8_第2页](http://file3.renrendoc.com/fileroot_temp3/2022-3/28/a4891a70-888d-498c-9b8d-709fb85a23b8/a4891a70-888d-498c-9b8d-709fb85a23b82.gif)
![计算机体系结构8_第3页](http://file3.renrendoc.com/fileroot_temp3/2022-3/28/a4891a70-888d-498c-9b8d-709fb85a23b8/a4891a70-888d-498c-9b8d-709fb85a23b83.gif)
![计算机体系结构8_第4页](http://file3.renrendoc.com/fileroot_temp3/2022-3/28/a4891a70-888d-498c-9b8d-709fb85a23b8/a4891a70-888d-498c-9b8d-709fb85a23b84.gif)
![计算机体系结构8_第5页](http://file3.renrendoc.com/fileroot_temp3/2022-3/28/a4891a70-888d-498c-9b8d-709fb85a23b8/a4891a70-888d-498c-9b8d-709fb85a23b85.gif)
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、1Branch InstructionsSima, Fountain and KacsukChapter 8CSE3304 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 19972Major Chapter GoalslTo understand how to minimise the performance degradation of branches Basic approach to branch handling Delayed branching Branch processi
2、ng Multi-way branching Guarded Execution David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 19973Types of branch instructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranch toSubroutineReturn fromSubroutineLoop closingconditionalbranchOtherconditional
3、branchAlpha:BR taPowPC: b taBSR tabl taRET tabclrBTL R1, tabdnz taBNE R1, tabne ta David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 19974Types of branch instructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranch toSubroutineReturn fromSubroutineLoo
4、p closingconditionalbranchOtherconditionalbranchAlpha:BR taPowPC: b taBSR tabl taRET tabclrBTL R1, tabdnz taBNE R1, tabne taBR tata:BSR tata:RETta:SUBL R1,1,R1BLT R1,taBNE R1,tata: David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 19975Why worry about different types of bra
5、nches?lBranches cause stalls of pipelineslDifferent techniques for minimizing branch effects lTake advantage of different type of branch e.g. Unconditional branch ALWAYS occurs. Therefore, hardware can plan for the branch in advance. e.g. difference between loop closing and normal conditional David
6、Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 19976Ways of checking conditionlConditional branches need to evaluate a predicatelTwo main approaches Result state IBM 360, 370, PDP-11, VAX-11, x86, Pentium, MC68000, Sparc, PowerPC. Direct Check PDP-10, Cyber/70, PDP-8, CRAY, MI
7、PS, HPPA, Dec Alpha David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 19977Result StatelResult state is declared to hold status information related to result of operationlTypical implementation is condition codes or flag registers which are updated after every arithmetic re
8、sult is producedlConditional branch instructions interrogate the flags in subsequent instructions David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 19978Result state disadvantageslThe generation of result state is not straightforward; irregular structure occupies additional
9、 chip arealMakes pipeline longerALUTest 0 0= 01 Clock Cycle David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 19979Result state disadvantages .lSequential in concept. How do we pack instructions in Superscalar and VLIW machine?add r1, r2, r3sub r5, r6, r7blt ta1bgt ta2add r
10、1, r2, r3sub r5, r6, r7blt ta1bgt ta2 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199710Direct ChecklNo result state is declaredlSpecified conditions are directly checked by explicit instructionslConditional branching can be requested if the specified conditions are m
11、et. lMay involve one or two instructionslFits better into superscalar architectureslShorter clock cycle because only check when necessary David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199711Comparison of conditional branchesTwo instruction Implementationadd r1, r2, r3cm
12、peq r7, r1, 0bt r7, labeldiv r5, r4, r1One instruction Implementationadd r1, r2, r3beq r1, labeldiv r5, r4, r1 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199712The effect of brancheslNeed to understand whether branches are taken or notlThen tailor the architecture to
13、 perform fastest on most common case.lIt turns out that most branches are taken .FetchDec/RdALUWriteFetchDec/RdALUWriteFetch David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199713Branch statistics .UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranch to
14、SubroutineReturn fromSubroutineLoop closingconditionalbranchOtherconditionalbranch 1/3 1/3 1/3Taken for first n-1 iterationsTakenNotTaken 1/6 1/6Taken 5/6 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199714Branch HandlingUtilizing branch delay slotsHandling of unresolv
15、ed conditional branches Avoiding cond. branches Delayed BranchingPerformanceBlockingbranch proc.Speculativebranch proc.Multiwaybranch proc.Branch processingGuardedExecution David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199715Branch HandlinglDelayed branching Utilise oth
16、erwise wasted cycles following branches. Achieved by inserting an instruction behind the branch and executing it before the branch Utilized on Early and Subsequent RISC architectures David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199716Delayed branchingFetchDec/RdALUWrit
17、eFetchDec/RdALUWriteFetchDeadDeadaddbsubadd r1, r2, r3b anywherediv r5, r4, r1anywhere: sub .add r3, r2, r6 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199717Delayed branching .FetchDec/RdALUWriteFetchDec/RdALUWriteaddbdivadd r1, r2, r3bd anywherediv r5, r4, r1anywher
18、e: sub .add r3, r2, r6FetchDec/RdALUWriteFetchDec/RdALUWriteaddDelayed BranchFetchDec/RdALUWritesub David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199718Delayed branching optionslCan reduce to single delay slot if target address available at end of decode phaselCompiler
19、places NOPs in slots and migrates instructions into slots using code migrationlCompiler must perform dataflow analysis David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199719Code migration to fill slotsadd r8, r9, r10bd anywherediv r5, r4, r1anywhere: sub .add r3, r2, r6NO
20、PNOPadd r3, r2, r6bd anywherediv r5, r4, r1anywhere: sub .add r8, r9, r10 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199720Does this work with conditional branches/?lYes, but code migrated into delay slot must be performed unconditionallyadd r8, r9, r10beq r6,anywher
21、ediv r5, r4, r1add r3, r2, r6NOPNOPadd r3, r2, r6div r5, r4, r1add r8, r9, r10beq r6,anywhere David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199721Optimisations - AnnulmentlAnnul delay slot if branch not taken Useful for backward conditional branches Makes it possible to
22、 move a loop body instruction into the delay slotlAnnul delay slot if branch is taken Useful for forward conditional branches Makes it possible to relocate an instruction from sequential path into the delay slot David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199722Annul
23、delay slot if branch not taken1234 BrCDelayLoop bodyConditionalBranch with no AnnulmentLoop requires 5 instructions David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199723Annul delay slot if branch not taken .1234 BrCDelayLoop bodyConditionalBranch withAnnulment1Loop requi
24、res 4 instructions David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199724Annul delay slot if branch is taken1ConditionalBranch with no Annulment23BrCDelay45 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199725Annul delay slot if branch is tak
25、en1ConditionalBranch withAnnulment23BrC45 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199726FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteBranch detectionlIn general, the earlier a branch is detected, the better the handling Reduces numbe
26、r of delay slots requiredlEarly branch detection In parallel Look ahead Integrated with fetchFetchDec/RdALUWrite David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199727Early branch detectionlIn Parallel Branches are detected in parallel with decode of other instructions us
27、ing a dedicated branch decoderlLook ahead Branches are detected from the instruction buffer but ahead of general instruction decodinglIntegrated Branches are detected during instruction fetch David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199728Conditional brancheslThe p
28、roblem with conditional branches is that we may not know until late in the instruction whether we wish to take or reject the control transferTake branch?FetchDec/RdALUWritebzr1,taget bzread r1cmp 0WritePC David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199729Handling unre
29、solved conditional brancheslBlocking branch processinglSpeculative branch processing Branch prediction Extent of speculation Recovery from mis-predictionlMulti-way branching David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199730Blocking branch processinglTrivial approachl
30、Execution of conditional is simply stalled until condition is knownlResult state Can possibly order instructions to reduce delay in waiting for condition codes to be setlDirect checking Need to wait for ALU result in this instructionSet CCBrCFetchDec/RdALUWrite David Abramson, 2000Material from Sima
31、, Fountain and Kacsuk, Addison Wesley 199731Speculative branch processinglPipeline stalls can be avoidedlAfter detection of unresolved conditional branch a guess is made of the outcomelIf guess is correct Speculation confirmed and continued executionlIf guess is incorrect Instructions discarded Fetc
32、h restarted David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199732Branch prediction schemeslFixed predictionlTrue prediction A “true” guess is made Static Prediction Based on object code Dynamic Prediction Based on execution history David Abramson, 2000Material from Sima,
33、 Fountain and Kacsuk, Addison Wesley 199733Fixed Prediction (Guess Not Taken)lDetect unresolved conditional branch and guess as not takenlContinue with the execution of the sequential path, but start the execution of taken path (e.g. calculate BTA)lWhen conditional becomes available, check the guess
34、lIf correct, continue with execution delete taken path pre-processinglIf incorrect, delete speculative execution and continue with taken path David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199734Fixed Prediction (Guess Not Taken)ConditionKnownEvaluate BTACORRECT David Ab
35、ramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199735Fixed Prediction (Guess Not Taken)ConditionKnownEvaluate BTAINCORRECT David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199736Fixed Prediction (Guess Taken)ConditionKnownEvaluate BTAINCORRECT David Ab
36、ramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199737Static PredictionlLike fixed prediction, but some properties of the object code are used to decide whether to use always taken or not takenlHas possibility of performing better since some information is used. David Abramson, 2
37、000Material from Sima, Fountain and Kacsuk, Addison Wesley 199738Static Prediction .lTypical options are Opcode based prediction For certain opcodes the branch is assumed to be taken, for others not taken. Displacement based prediction If displacement 0 taken, else not taken Compiler directed Predic
38、t bit in instruction set by compiler analysis or trace dataOpBTASign bit David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199739Dynamic predictionlPrediction is made on branch historylPhilosophy is that history is good guide to future behaviourlGood for loops because tend
39、to iterate multiple timeslGood for some conditional branches depending on behaviour of datalGood for exception condition checking What Happened Last Time? David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199740Dynamic Prediction TechniqueslExplicit technique Branch history
40、 explicitly stated history bits 1, 2 or 3 bit schemes Number of bits represent likelihood of branch being taken in futurelImplicit technique Branch being “recorded” is an indication that branch was taken Roughly equivalent to 1 bit scheme David Abramson, 2000Material from Sima, Fountain and Kacsuk,
41、Addison Wesley 199741Branch History BitslFor each branch, record a status field.lOne bit status Did last occurrence of branch occur?lTwo bit status State table decideswhich statelThree bit scheme History of last 3 branches Majority decision on outcomeStateDesignation11Strongly Taken10Weakly Taken01W
42、eakly not taken00Strongly not taken David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199742State transitions in 2 bit schemelFor each branch AT = Actually taken ANT = Actually not takenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTPrediction Tak
43、enPrediction Not Taken David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199743State transitions in 2 bit schemelFor each branch AT = Actually taken ANT = Actually not takenStronglyTakenWeaklyTakenWeaklyNOTTakenStronglyNOTTakenATANTANTANTATATATANTStronglyNOTTakenANTWeaklyNO
44、TTakenATWeaklyTakenAT ANTWeaklyNOTTakenATWeaklyTaken David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199744Implicit Dynamic SchemelTwo main schemes Branch Target Access Cache (BTAC) Branch Target Instruction Cache (BTIC)lBoth schemes introduce an extra cachelEntries are o
45、nly stored for taken branches Not taken branches are not storeslIf an entry is in cache, then next branch is takenlBehaves like 1 bit David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199745BTAC and BTIC implementationlStore Branch Address PLUS The target address itself (BT
46、AC) The instruction at the target (BTIC)10002000Load R1, .100020001000Load R1, . David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199746BTAC implementation .Load R1, .1000200010002000Dont take branchSpeculativelyWhoops!Add this branch address to the cacheNext timeTake bran
47、chSpeculatively David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199747Performance differences in BTAC and BTIClBTAC delivers address of branch in time for next fetchlBTIC delivers actual instruction. Can be slower because it does not require fetchLookup BTACFetchDec/RdExe
48、WriteFetchDec/RdExeWriteLookup BTIC David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199748Implementation of History bitslPlacement of history bits I-cache Alpha, UltraSparc Branch History Table (BHT) Power PC 604, 620, R10000 Branch Target Address Cache (BTAC) MC68060, Pe
49、ntium, R8000lI-cache and BHT effectively the same David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199749I-Cache and BHT (Power PC 604)I-cache16 KFour way set associativeInstruction fetch addressBHT128 x 4 entriesPredictionLogic4 instr./cycle2 History BitsTaken/Not takenBT
50、A for a taken guessDecode Queue4 x 1 Instr.Issue Queue David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199750Prediction AccuracyP = fc * Pc + fm * Pmwhere fc: Probability of correctly predicting branches fm: Probability of mis-predicting branches Pc: Penalty of correctly
51、predicted branches Pm: Penalty of mis-predicted branches David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199751Prediction Accuracy .Branch Penalty P0 Against accuracy00.20.40.60.811.21.41.6Fixed - AlwaysTaken (0.625)Static,displacementbased (0.685)Dynamic 1 bit(0.89)Dynam
52、ic 2 bit(0.93)Prediction MethodPenalty (P0) (Cycles)Pm=4Pm=2Pm=1 David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199752Recovery from MispredictionlIf branch prediction hardware makes wrong guess, need to revert to alternative path of execution.FetchDec/RdALUWriteFetchDec/
53、RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWrite? David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199753Recovery from Misprediction .lFor pipelined machines, may be possible to abort register writeslStore instructions are harder to recoverlCondition codes might also need
54、to be restored.FetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteRegister File David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199754Schemes to shorten mis-prediction recoverylBasic prior measures for recovery In a “taken” guess, save sequential
55、address In a “not taken” guess, pre-calculate and save branch target address Requires two address registers per speculated conditional branchSave this addressSave this address David Abramson, 2000Material from Sima, Fountain and Kacsuk, Addison Wesley 199755Schemes to shorten mis-prediction recovery.lEnhanced prior measures to shorten recovery In a “taken” guess save sequ
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 【正版授权】 ISO 9654:1989 EN Pliers and nippers for electronics - Single-purpose nippers - Cutting nippers
- 【正版授权】 ISO 965-1:1998 EN ISO general-purpose metric screw threads - Tolerances - Part 1: Principles and basic data
- 【正版授权】 ISO 9588:1999 EN Metallic and other inorganic coatings - Post-coating treatments of iron or steel to reduce the risk of hydrogen embrittlement
- 【正版授权】 ISO 9559:1989 EN Photography - 35 mm filmstrip projectors - Specifications for frame masks
- 【正版授权】 ISO 9518:1992 EN Forestry machinery - Portable chain-saws - Kickback test
- 【正版授权】 ISO 9413:2019 EN Tyre valves - Dimensions and designation
- 【正版授权】 ISO 9342-2:2005 EN Optics and optical instruments - Test lenses for calibration of focimeters - Part 2: Test lenses for focimeters used for measuring contact lenses
- 【正版授权】 ISO 9311-2:2011 EN Adhesives for thermoplastic piping systems - Part 2: Determination of shear strength
- 【正版授权】 ISO 9241-971:2020 EN Ergonomics of human-system interaction - Part 971: Accessibility of tactile/haptic interactive systems
- 【正版授权】 ISO 9225:2012 EN Corrosion of metals and alloys - Corrosivity of atmospheres - Measurement of environmental parameters affecting corrosivity of atmospheres
- 口腔颌面外科_颌骨骨折
- 色盲检测图(第五版)最新版本
- 压铸工艺计算表
- 跳绳比赛作文指导.ppt
- 低损耗氮化硅可重构微波光子滤波器
- 地基强夯工程施工方案
- 21病区PDCA案例(跌倒坠床)(专业分析)
- 附件报名登记表
- 常用地质图例及符号
- Thin Film Process Introduction
- 小学生奋进新时代礼赞新中国600字作文5篇
评论
0/150
提交评论