ch4-1指令级并行_记分牌算法_现在微处理机_第1页
ch4-1指令级并行_记分牌算法_现在微处理机_第2页
ch4-1指令级并行_记分牌算法_现在微处理机_第3页
ch4-1指令级并行_记分牌算法_现在微处理机_第4页
ch4-1指令级并行_记分牌算法_现在微处理机_第5页
已阅读5页,还剩80页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、复习 流水线基本概念 能够流水的前提条件 流水线的评价指标 DLX基本流水线 五个阶段,各阶段的功能 不同数据通路的作用 段间寄存器的设置和作用流水段流水段表表3.1 DLX3.1 DLX流水线的每个流水段的操流水线的每个流水段的操作作任何指令类型任何指令类型ALU ALU 指令指令Load/Store Load/Store 指令指令分支指令分支指令IFIFIDIDEXEXIF/ID.IR IF/ID.IR MemPC MemPCIF/ID.NPC,PC IF/ID.NPC,PC (if EX/MEM.condEX/MEM.ALUOutput else PC+4); (if EX/MEM.co

2、ndEX/MEM.ALUOutput else PC+4);ID/EX.A ID/EX.A RegsIF/ID.IR RegsIF/ID.IR6.106.10; ID/EX.B ; ID/EX.B RegsIF/ID.IR RegsIF/ID.IR11.1511.15;ID/EX.NPC ID/EX.NPC IF/ID.NPC; ID/EX.IR IF/ID.NPC; ID/EX.IR IF/ID.IR; IF/ID.IR;ID/EX.Imm ID/EX.Imm (IR (IR1616) )1616#IR#IR16.3116.31; ;EX/MEM.IR ID/EX.IR; EX/MEM.IR

3、 ID/EX.IR; EX/MEM.ALUOutput EX/MEM.ALUOutput ID/EX.A op ID/EX.B ID/EX.A op ID/EX.B 或或EX/MEM.ALUOutput EX/MEM.ALUOutput ID/EX.A op ID/EX.Imm;ID/EX.A op ID/EX.Imm;EX/MEM.cond 0;EX/MEM.cond 0;EX/MEM.IR ID/EX.IR; EX/MEM.IR ID/EX.IR; EX/MEM.B ID/EX.BEX/MEM.B ID/EX.BEX/MEM.ALUOutput EX/MEM.ALUOutput ID/EX

4、.A + ID/EX.Imm;ID/EX.A + ID/EX.Imm;EX/MEM.cond 0;EX/MEM.cond 0;EX/MEM.ALUOutputEX/MEM.ALUOutputID/EX.NPC+ID/EX.Imm;ID/EX.NPC+ID/EX.Imm;EX/MEM.cond EX/MEM.cond (ID/EX.A op 0);(ID/EX.A op 0);流水段流水段DLXDLX流水线的每个流水段的操作(续)流水线的每个流水段的操作(续)任何指令类型任何指令类型ALU ALU 指令指令Load/Store Load/Store 指令指令分支指令分支指令MEMMEMWBWBM

5、EM/WB.IR EX/MEM.IR; MEM/WB.IR EX/MEM.IR; MEM/WB.ALUOutput MEM/WB.ALUOutput EX/MEM.ALUOutput;EX/MEM.ALUOutput;MEM/WB.IR EX/MEM.IR; MEM/WB.IR EX/MEM.IR; MEM/WB.LMD MEM/WB.LMD MemEX/MEM.ALUOutput;MemEX/MEM.ALUOutput;或或MemEX/MEM.ALUOutput MemEX/MEM.ALUOutput EX/MEM.B; EX/MEM.B;RegsMEM/WB.IRRegsMEM/WB.IR

6、16.2016.20 MEM/WB.ALUOutput; MEM/WB.ALUOutput;或或RegsMEM/WB.IRRegsMEM/WB.IR11.1511.15 MEM/WB.ALUOutput;MEM/WB.ALUOutput;RegsMEM/WB.IRRegsMEM/WB.IR11.1511.15 MEM/WB.LMD; MEM/WB.LMD;复习 流水线中的相关 结构相关:需要更多的硬件资源 数据相关:需要定向,编译器调度 控制相关:尽早检测条件,计算目标地址,延迟转移,预测 如何构造,如何避免 实例分析:MIPS R4000 特点ADD R1, R2, R3 IF ID EX

7、ME WBSUB R5, R1, R7 IF ID EX ME WBXOR R6, R1, R7 IF ID EX ME WBOR R7, R1, R7 IF ID EX ME WBLW R1, 45(R2) IF ID EX ME WBSUB R8, R6, R7 IF ID EX ME WB ADD R5, R1, R7 IF ID EX ME WBADD R1, R2, R3 IF ID EX ME WBSUB R8, R6, R7 IF ID EX ME WB LW R5, 45(R1) IF ID EX ME WBLW R1, 30(R2) IF ID EX ME WBSUB R8,

8、 R6, R7 IF ID EX ME WB LW R5, 45(R1) IF ID EX ME WBADD R1, R2, R3 IF ID EX ME WBSW R5, 30(R1) IF ID EX ME WB SW R6, 45(R1) IF ID EX ME WBADD R1, R2, R3 IF ID EX ME WBSW R1, 45(R3) IF ID EX ME WB SW R1, 45(R4) IF ID EX ME WBLW R1, 56(R2) IF ID EX ME WBSW R1, 45(R3) IF ID EX ME WB SW R1, 45(R4) IF ID

9、EX ME WBCh 4 指令级并行Embedded System Lab Fall 2012内容提要 基本的指令调度方法 记分牌算法 Tomasulo算法4.1 指令级并行(Instruction Level Parallelism) 相关是程序运行的本质特征 相关带来数据冒险 冒险导致CPU停顿 Stall相关的分类: 数据相关 结构相关 控制相关 ILP: 无关的指令重叠执行Loop: LD F0,0(R1)SUBI R2,R2,8SUBI R3,R3,8 ADDD F4,F0,F2 名相关 另一种相关称为名相关( name dependence): 两条指令使用同一个名字(regist

10、er or memory location) 但不交换数据 反相关(Antidependence) (WAR) Instruction j 所写的寄存器或存储单元,与 instruction i 所读的寄存器或存储单元相同,注instruction i 先执行 输出相关(Output dependence) (WAW) Instruction i 和instruction j 对同一寄存器或存储单元进行写操作,必须保证两条指令的写顺序 下列是否有名相关? 1 Loop: LDF0,0(R1) 2ADDDF4,F0,F2 3SD0(R1),F4 4LDF0,-8(R1) 5ADDDF4,F0,F

11、2 6SD-8(R1),F4 7LDF0,-16(R1) 8ADDDF4,F0,F2 9SD-16(R1),F4 ; 10LDF0,-24(R1) 11ADDDF4,F0,F2 12SD-24(R1),F4 13SUBIR1,R1,#32 14BNEZR1,LOOP 15NOP 如何消除名相关如何消除名相关?名相关的消除 1 Loop: LDF0,0(R1) 2ADDDF4,F0,F2 3SD0(R1),F4 ;drop SUBI & BNEZ 4LDF6,-8(R1) 5ADDDF8,F6,F2 6SD-8(R1),F8 ;drop SUBI & BNEZ 7LDF10,-16(R1) 8A

12、DDDF12,F10,F2 9SD-16(R1),F12 ;drop SUBI & BNEZ 10LDF14,-24(R1) 11ADDDF16,F14,F2 12SD-24(R1),F16 13SUBIR1,R1,#32;alter to 4*8 14BNEZR1,LOOP 15NOP 这种方法称为寄存器重命名这种方法称为寄存器重命名“register renaming”指令级并行的若干定义 基本块的定义 直线型代码,无分支 整个程序是由分支语句连接基本块构成 MIPS 的分支指令占15%左右,基本块的大小在47条指令指令级并行的若干定义 OS代码中的分支较少负责资源管理填写状态寄存器填写控

13、制寄存器设置控制变量 跨基本块的并行(循环级并行) 循环的特征 控制循环的分支指令是有执行偏好的 绝大多数是成功的, 预测比较容易,但必须有预测方案 流水线的平均CPI Pipeline CPI = Ideal Pipeline CPI + Struct Stalls + RAW Stalls + WAR Stalls + WAW Stalls + Control Stalls 本章研究 减少停顿(stalls)数的方法和技术采用的基本技术指令集调度的基本途径基本途径软件方法(编译器优化)Gcc: 17%控制类指令5 instructions + 1 branch在基本块上,得到更多的并行性挖

14、掘循环级并行硬件方法动态调度方法静态与动态调度 8086 IO周期和CPU周期 386 指令重叠执行 486 指令级并行 动态指令集调度Pentium Pro Pentium II,III,IV, AMD Athlon, MIPS R10K R12K, Sun UltraSpac, PowerPC 603,G3,G4,G5(IBM-Motorola-Apple),Alpha 21264 静态调度 Itanium & Transmeta: Crusoe 一个循环的例子for (i = 1; i = 1000; i+) x(i) = x(i) + y(i); 特征 计算x(i)时没有相关 并行方式

15、 最简单的方法,循环展开。 采用向量的方式X=X+Y60年代开始 Cray HITACHI NEC Fujitsu目前均采用向量加速部件的形式 GPU DSP简单循环及其对应的汇编程序for (i=1; i=1000; i+) x(i) = x(i) + s; Loop: LD F0,0(R1);F0=vector element ADDD F4,F0,F2;add scalar from F2 SD 0(R1),F4;store result SUBI R1,R1,8;decrement pointer 8B (DW) BNEZ R1,Loop;branch R1!=zero NOP;del

16、ayed branch slotFP 循环中的相关Loop:LDF0,0(R1);F0=vector element ADDDF4,F0,F2;add scalar from F2 SD0(R1),F4;store result SUBIR1,R1,8;decrement pointer 8B (DW) BNEZR1,Loop;branch R1!=zero NOP;delayed branch slot产生结果的指令产生结果的指令 使用结果的指令使用结果的指令所需的延时所需的延时FP ALU opAnother FP ALU op3FP ALU opStore double2 Load do

17、ubleFP ALU op1Load doubleStore double0Integer opInteger op0 需要在哪里加需要在哪里加stalls?(假设分支在(假设分支在ID段得到地址和条件)段得到地址和条件)FP 循环中的Stalls 10 clocks: 是否可以通过调整代码顺序使stalls减到最小 1 Loop:LDF0,0(R1);F0=vector element 2stall 3ADDD F4,F0,F2;add scalar in F2 4stall 5stall 6 SD0(R1),F4;store result 7 SUBIR1,R1,8;decrement p

18、ointer 8B (DW) 8 stall 9 BNEZR1,Loop;branch R1!=zero 10stall;delayed branch slot产生结果的指令产生结果的指令 使用结果的指令使用结果的指令所需的延时所需的延时FP ALU opAnother FP ALU op3FP ALU opStore double2 Load doubleFP ALU op1Load doubleStore double0Integer opInteger op0FP 循环中的最少Stalls数 6 clocks: 通过循环展开通过循环展开4次是否可以提高性能次是否可以提高性能? 1 Loo

19、p:LDF0,0(R1) 2SUBIR1,R1,8 3ADDDF4,F0,F2 4 stall 5BNEZR1,Loop;delayed branch 6 SD8(R1),F4;altered when move past SUBISwap BNEZ and SD by changing address of SD 1 Loop:LDF0,0(R1);F0=vector element 2stall 3ADDDF4,F0,F2;add scalar in F2 4stall 5stall 6 SD0(R1),F4;store result 7 SUBIR1,R1,8;decrement poi

20、nter 8B (DW) 8 stall 9 BNEZR1,Loop;branch R1!=zero 10stall;delayed branch slot循环展开4次(straightforward way) Rewrite loop to minimize stalls? 1 Loop: LDF0,0(R1) stall 2ADDDF4,F0,F2 stall stall 3SD0(R1),F4 ;drop SUBI & BNEZ 4LDF6,-8(R1) stall 5ADDDF8,F6,F2 stall stall 6SD-8(R1),F8 ;drop SUBI & BNEZ 7LDF

21、10,-16(R1) stall 8ADDDF12,F10,F2 stall stall 9SD-16(R1),F12 ;drop SUBI & BNEZ 10LDF14,-24(R1) stall 11ADDDF16,F14,F2 stall stall 12SD-24(R1),F16 13SUBIR1,R1,#32 stall ;alter to 4*8 14BNEZR1,LOOP 15NOP 15 + 4 x (1+2) + 1 = 28 cycles, or 7 per iteration Assumes R1 is multiple of 4名相关如何解决名相关如何解决Stalls数

22、最小的循环展开 代码移动后 SD移动到SUBI后,注意偏移量的修改 Loads移动到SD前,注意偏移量的修改1 Loop: LDF0,0(R1)2LDF6,-8(R1)3LDF10,-16(R1)4LDF14,-24(R1)5ADDDF4,F0,F26ADDDF8,F6,F27ADDDF12,F10,F28ADDDF16,F14,F29SD0(R1),F410SD-8(R1),F811SUBIR1,R1,#3212SD16(R1),F1213BNEZR1,LOOP14SD8(R1),F16; 8-32 = -24 14 clock cycles, or 3.5 per iteration循环展

23、开示例小结移动SD到SUBI和BNEZ后,需要调整SD中的偏移循环展开对循环间无关的程序是有效降低stalls的手段(对循环级并行).不同次的循环,使用不同的寄存器.指令调度,必须保证程序运行的结果不变 指令重排+循环展开 不做任何优化 10000 采用指令重排 6000 4次循环展开 7000 4次循环展开+指令重排 3500循环展开(1/3) Example: 下列程序段存在哪些数据相关? (A,B,C 指向不同的存储区且不存在覆盖区) for (i=1; i=100; i=i+1) Ai+1 = Ai + Ci; /* S1 */Bi+1 = Bi + Ai+1; /* S2 */ 1.

24、 S2使用由S1在同一循环计算出的 Ai+1. 2. S1 使用由S1在前一次循环中计算的值,同样S2也使用由S2在前一次循环中计算的值. 这种存在于循环间的相关,我们称为 “loop-carried dependence” 这表示循环间存在相关,不能并行执行,它与我们前面的例子中循环间无关是有区别的循环展开(2/3) Example:A,B,C,D distinct & nonoverlapping for (i=1; i=100; i=i+1) Ai = Ai + Bi; /* S1 */Bi+1 = Ci + Di; /* S2 */1. S1和S2没有相关,S1和S2互换不会影响程序的

25、正确性 2. 在第一次循环中,S1依赖于前一次循环的Bi.循环展开(3/3)A1 = A1 + B1;for (i=1; i=99; i=i+1) Bi+1 = Ci + Di;Ai+1 = Ai+1 + Bi+1;B101 = C100 + D100;for (i=1; i=100; i=i+1) Ai = Ai + Bi; /* S1 */Bi+1 = Ci + Di; /* S2 */OLD:NEW:期中测验复习 指令级并行 在流水线中多条指令能够并行执行 流水线技术 流水线的缺点? 数据相关、控制相关、结构相关 顺序执行 解决方案 指令调度技术、循环展开技术、重命名技术 记分牌和Tom

26、asulo算法简单循环及其对应的汇编程序for (i=1; i out-of-order completion 记分牌算法 Tomasulo算法硬件方案之一: 记分牌 记分牌的基本概念示意图记分牌控制的四阶段(1/2)1. Issue指令流出,检测结构相关 如果当前指令所使用的功能部件空闲,并且没有其他活动的指令使用相同的目的寄存器(WAW), 记分牌发射该指令到功能部件,并更新记分牌内部数据,如果有结构相关或WAW相关,则该指令的发射暂停,并且也不发射后继指令,直到相关解除. 2. Read operands没有数据相关时,读操作数 如果先前已发射的正在运行的指令不对当前指令的源操作数寄存器

27、进行写操作,或者一个正在工作的功能部件已经完成了对该寄存器的写操作,则该操作数有效. 操作数有效时,记分牌控制功能部件读操作数,准备执行。 记分牌在这一步动态地解决了RAW相关,指令可能会乱序执行。记分牌控制的四阶段(2/2)3.Execution取到操作数后执行 (EX) 接收到操作数后,功能部件开始执行. 当计算出结果后,它通知记分牌,可以结束该条指令的执行. 4.Write resultfinish execution (WR) 一旦记分牌得到功能部件执行完毕的信息后,记分牌检测WAR相关,如果没有WAR相关,就写结果,如果有WAR 相关,则暂停该条指令。Example: DIVDF0,

28、F2,F4 ADDDF10,F0,F8 SUBDF8,F8,F14 CDC 6600 scoreboard 将暂停 SUBD 直到ADDD 读取操作数后,才进入WR段处理。思考 记分牌和DLX流水线有什么关系ISROEXWRScoreboard记分牌的结构1. Instruction status记录正在执行的各条指令处于四步中的哪一步2. Functional unit status记录功能部件(FU)的状态。用9个域记录每个功能部件的9个参量:Busy指示该部件是否空闲Op该部件所完成的操作Fi其目标寄存器编号Fj, Fk源寄存器编号Qj, Qk产生源操作数Fj, Fk的功能部件Rj, R

29、k标识源操作数Fj, Fk是否就绪的标志,读走之后设置为No3. Register result status如果存在功能部件对某一寄存器进行写操作,指示具体是哪个功能部件对该寄存器进行写操作。如果没有指令对该寄存器进行写操作,则该域为BlankScoreboard ExampleInstruction status ReadExecutionWriteInstructionjkIssueoperands complete ResultLDF634+R2LDF245+R3MULTD F0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit sta

30、tusdestS1S2FU for j FU for kFj?Fk?TimeNameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNoDivideNoRegister result statusClockF0F2F4F6F8F10F12.F30FU* *加法指令执行需要加法指令执行需要2 2个周期,乘法需要个周期,乘法需要1010个周期,除法需要个周期,除法需要4040个周期个周期LDLD指令使用指令使用IntegerInteger整型部件整型部件Instruction status:Read Exec WriteInstructionjkIssu

31、e Oper Comp ResultLDF634+ R21LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Ti m e Nam eBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6R2YesMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F301FUIntegerScoreboard Example: Cycle 1Instruction

32、status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R212LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6R2YesMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F302FUInteger I

33、ssue 2nd LD?Scoreboard Example: Cycle 2Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R2123LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF6R2NoMult1NoMult2NoAddNoDivideNoRegister r

34、esult status:ClockF0F2F4F6F8F10 F12.F303FUInteger Issue MULT?Scoreboard Example: Cycle 3Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R3MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkInteger

35、NoMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F304FUIntegerScoreboard Example: Cycle 4Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35MULTDF0F2F4SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time Nam

36、eBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3YesMult1NoMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F305FUIntegerScoreboard Example: Cycle 5Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R356MULTDF0F2F46SUBDF8F6F2DIVDF10F0F6ADDDF6F8F2Functio

37、nal unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3YesMult1YesMultF0F2F4IntegerNoYesMult2NoAddNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F306FUMult1 IntegerScoreboard Example: Cycle 6Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF63

38、4+ R21234LDF245+ R3567M ULTDF0F2F46SUBDF8F6F27DIVDF10F0F6ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3NoMult1YesMultF0F2F4IntegerNoYesMult2NoAddYesSubF8F6F2IntegerYesNoDivideNoRegister result status:ClockF0F2F4F6F8F10 F12.F307FUMult1 IntegerAdd Re

39、ad multiply operands?Scoreboard Example: Cycle 7Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R3567MULTDF0F2F46SUBDF8F6F27DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerYesLoadF2R3NoMult1YesMultF0F2F4Int

40、egerNoYesMult2NoAddYesSubF8F6F2IntegerYesNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F308FUMult1 IntegerAddDivideScoreboard Example: Cycle 8a (First half of clock cycle)Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678M

41、ULTDF0F2F46SUBDF8F6F27DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1YesMultF0F2F4YesYesMult2NoAddYesSubF8F6F2YesYesDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F308FUMult1AddDivideScoreboard Example: Cycle 8b (

42、Second half of clock cycle)Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F279DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo10 Mult1YesMultF0F2F4YesYesMult2No2 AddYesSubF8F6F

43、2YesYesDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F309FUMult1AddDivide Read operands for MULT & SUB? Issue ADDD?Note RemainingScoreboard Example: Cycle 9Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF

44、8F6F279DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo9 Mult1YesMultF0F2F4NoNoMult2No1 AddYesSubF8F6F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3010FUMult1AddDivideScoreboard Example: Cycle 10Instruction stat

45、us:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F27911DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo8 Mult1YesMultF0F2F4NoNoMult2No0 AddYesSubF8F6F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister r

46、esult status:ClockF0F2F4F6F8F10 F12.F3011FUMult1AddDivideScoreboard Example: Cycle 11Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkR

47、jRkIntegerNo7 Mult1YesMultF0F2F4NoNoMult2NoAddNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3012FUMult1Divide Read operands for DIVD?Scoreboard Example: Cycle 12Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F

48、469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F213Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo6 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2YesYesDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3013FUMult1AddDivideScoreboard Example: Cycle 13In

49、struction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F21314Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo5 Mult1YesMultF0F2F4NoNoMult2No2 AddYesAddF6F8F2YesYesDivideYesDivF10F0

50、F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3014FUMult1AddDivideScoreboard Example: Cycle 14Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F21314Functional unit status:destS1S2FUFUFj?Fk?

51、Time NameBusyOpFiFjFkQjQkRjRkIntegerNo4 Mult1YesMultF0F2F4NoNoMult2No1 AddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3015FUMult1AddDivideScoreboard Example: Cycle 15Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF

52、245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo3 Mult1YesMultF0F2F4NoNoMult2No0 AddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3016FUMult1AddDivideScore

53、board Example: Cycle 16Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo2 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8

54、F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3017FUMult1AddDivide Why not write result of ADD? WAR Hazard!Scoreboard Example: Cycle 17Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F469SUBDF8F6F2791112DIV

55、DF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo1 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister result status:ClockF0F2F4F6F8F10 F12.F3018FUMult1AddDivideScoreboard Example: Cycle 18Instruction status:Rea

56、d Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F46919SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNo0 Mult1YesMultF0F2F4NoNoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6Mult1NoYesRegister

57、 result status:ClockF0F2F4F6F8F10 F12.F3019FUMult1AddDivideScoreboard Example: Cycle 19Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F2791112DIVDF10F0F68ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyO

58、pFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6YesYesRegister result status:ClockF0F2F4F6F8F10 F12.F3020FUAddDivideScoreboard Example: Cycle 20Instruction status:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F27911

59、12DIVDF10F0F6821ADDDF6F8F2131416Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddYesAddF6F8F2NoNoDivideYesDivF10F0F6YesYesRegister result status:ClockF0F2F4F6F8F10 F12.F3021FUAddDivide WAR Hazard is now gone. Scoreboard Example: Cycle 21Instruction stat

60、us:Read Exec WriteInstructionjkIssue Oper Comp ResultLDF634+ R21234LDF245+ R35678MULTDF0F2F4691920SUBDF8F6F2791112DIVDF10F0F6821ADDDF6F8F213141622Functional unit status:destS1S2FUFUFj?Fk?Time NameBusyOpFiFjFkQjQkRjRkIntegerNoMult1NoMult2NoAddNo39 DivideYesDivF10F0F6NoNoRegister result status:ClockF0

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论