1、实验二指令流水线相关性分析实验类别:验证实验 实验目的:通过使用 WINDLX模拟器,对程序中的三种相关现象进行观察,并对使用专用通路,增加运算部件等技术对性能的影响进行考察,加深对流水线和RISC处理器的特点的理解。实验学时:4 实验组人数:1/1实验设备环境:WinDLX模拟器可以装入DLX汇编语言程序,然后单步、设置断点或者连 续执行该程序。CPU的寄存器、流水线、I/O和存储器都可以使用图形的方式表 示出来。模拟器还提供了对流水线操作的统计功能。该模拟器对理解流水线和 RISC处理器的特点很有帮助。实验原理:指令流水线中主要有结构相关、数据相关、控制相关。相关影响流水线性能。 实验内容

2、和要求:使用 WinDLX模拟器,对求阶乘程序Fact.s做分析。实验步骤:1、观察程序中出现的数据/控制/结构相关程序中出现的数据相关1 <F | ID| MEM V/B1 IF| ID R-Slall| HEX| MEM | WBIF创| ID | ir£X | MEMIbu 間seqibnez rSjnput. Finishseqi r6j3,0xaAdr.; riput.Loop+On4T 白 mi 帖 led sucoeesJullpFirst Cpcle:-6L叙 Cyde; -1T otal Cycles; GIFCycles: 6(1)I ernninatsd

3、successfullyIMAFK-PC |-inpii.LcMp+Qx4)IFk,h1efn州AR f=0x60650®a PC< FC+'I (-input. Loop+0x8|No Stall: inquired.IDCycles: -5|2)T ei im in aled success fdlyA*R3 (=0x0)1 5Halfs) because of RAW-Hazaid with Ibu r3j0N0(r2)Ibu r3,0x0(r2)需要在WB周期才能将值写入r3里,而后续的指令 seqir5,r3,0x0a在intEx周期里读取r3寄存器的值,发生

4、了读写相关。所以为了避免冲突,将seqi r5,r3,0x0a 的指令的intEx延迟一个周期执行。程序中出现的控制相关add rljOJthlOOOid InputUnsignedmovi2fp nu.rlmovi2fp f10,r1在IF指令周期后为 aborted。原因在于:第二条指令 jalIn putU nsig ned为无条件转移指令,但只有在该指令译码的时候才可以知道转移的位置。但是此时movif2p f10,r1指令已经取出,所以需要将该指令流水清 空,由于是刚执行了 IF指令,所以只需要重新取新的指令就可以了。程序中出现的结构相关addl 0x1Adr.; input. Lo

5、qp+0h1 8Code: 0x20420001Terrinated successfiJIyFirst Cycle; -13Last ljcle: -5Totel Cycles: 9IFIDCycles: *13(5(Cpdes: -8(1)Terminated sdccesslullyTefifruinalledl successfullyIMAFkF匚-input Loop+Ox18-12(=0x1034|IR<-ManpMAA(-Ob042OaO1fJciIls reniJirpdPC<-PC+4 =input. Loap+Obflci4 Still(i) tiecause

6、 of1stiucluial Hazaid!1| IFI ID | MEM | WEK 'IF |ID |P-9a|计反1 MEM| WBIF |张1ID | intEX | MEM | W0rmultu r1 jl ”闾add(1addi r2j2,0«1由于上条指令 add r1,r1,r3的intEx的执行延迟了 4个指令周期,所以addi,r2,r2,0x1 指令就不能在add r1,r1,r3的intEx的执行前进入ID指令译码的执行。所以这里出现了指令译码器的争用。因而发生了结构相关。2、考察增加浮点运算部件对性能的影响F面两组数据来自Statistics窗口,都

7、是算5的阶乘,分别是运算部件为都为一个,运算部件都为两个的数据统计。Floating Point 託曰 gu 匚口a tic nFleeting Point Stage CorrfigLirationCount:Delay:Addition UniteJMultipl心拥口门Uni席;1oDivision Unitsc1Ccunt;DelayAddition Unite:22Multiplication Urits:2EDivision Umh:219Nvmber of Unite in «ch Claw; 1 <- M <Delay (Clock Cycles): 1

8、<=N <-EDWARNING; If you clnge the values, the processoi艸 ill be reset automaticaJjlNumber of Units in each 口ass; 1 < = M <=E,Delay (Clock Cycles): 1 <- N <- EDWARNING: If you change the values, the processor理ill be reset aulomalicalylokCarmiCancelTotal.勺 5 Cjclefs: executed.ID eMKU

9、ted bv G2 lrHtiuclhri(tJ.1 lnstucticn($j uirertlv in Pipeline.Total:35 Cjjcle(s) executed.ID ewecuiled » &2 Irsbudtionfs).2 Insbuctionft) current in Pipeline.Hardware configuralion.Memysiae: 32?68 BytesfaddDCE lager L required Cycles: 2 fmulEXtaget:", reqjiied Zyclei: 5 idivEX-Sajes: t

10、 required Cydei; 1'J FnwafdrigenafalBdIlsirdvare eawf igurat ian: Memav wise: 32768 Bvtes JaddE-Slages: 2y lequired Cycles: 2 fmulEX-Stagesc Z required Qctes: 5 fdivEX-Stagesc 2, required Dyclesi 19 Fawarding enablecLStalls:RAW tak 10(10.53 of dll Gcte虬 thereof:LP 紡Is: 2 (30.00玄 ol RAWsts)BraneW

11、Jurp stalsi 2 (20.0CK of FlAW slallsl F血ng point 如k: E (GQOOK of RAW stalls ill :i'.u'dlL5Jt.jSbudual rtaki 0 (0100% ol al CjjcIk) 匚ontwl state: 9 (147? of dQrH) Trap12 (12越 of al cesTout 31(3263% of defiesStalls:RAW itak 10 (10.52 of all CJietes, thereof:LD stab: 2 (20.0D of RAW 曲IsSranchJu

12、mp stalls; 2 (iO.UU- of RAW 崩k) rioating point 抽Ik: G (GO.OO ot RAW 溜團WAW stalls: 0 OlD濮 of 制 Cydesj Sttucbural stals: 0 (OLOO of dll Cycles Control stalls: 9 (SL475C of aN Cycla&J T rap 就als: 12 (12.63玄 of all Cycles Total: 31 加1料(竝E我 ofQcles)Ccmditionail Branches):T otat 7 (1129% of al Instruc

13、tionaJ, thereof tkeri: 2 (2&57 of Alcond Bfarchesj not taken: 5 (71.13 ol al cord Oianches)Conditi口nad branches);r otab 7 (11.29 of all Instructiarisj. thereoL taken: 2 (28.57 of all cend. Branches) not take it 5 (71.42 妥 of all ContL Brancheslod-St 匚 re-tiL 启 tmu t jxii 含;T ct at 12 (19l35t of

14、1 lnskuetian$L lhewtLoads: 6150. OOX of Und-/5toiHnstructknsStoves: G |5u. JU of LcaJ-?S tore-lnstructioni)Load-zStoreInstrueticns:r otalz 12 1 g.9E聖 df al Inshucbons:, thereof:Loads: 6 (5100 of Load-/Slore-I rEtfucfcns)Stores: 6 (SOlOO of Load-/Slcre-lnstiucbDns)Floating point stage instructioia.T

15、otat 9 (1452K of al Irtfitructitins, thereokAdAicns: 4 (44.44. of Floatng point stage inst.jMulliplcations; 5155.56 of Floathg pdnt 规ag总 inSt.)CivisiMS: 0(0.OOK of Floatirg pointihft.)Floating point Mt =tge instructions:Total; 9 (14.52 of all InE船伽札 thereof;Additions: 4 (44.44 of Floating port stage

16、 inst.JMukjpications: 5 (55.56 of Floating point stage in st.Divisions: 0 |0.00 of Fkiating point stage inel|Trps:I 4 (6 45% of allmstructiorii)Traps:Traps: 4 (S.45 of al Incbuction)通过比较可以发现,这两组数据在性能统计上是一样的。 所以增加浮点运算部件对性能的影响没有什么影响(对于该程序而言)3、考察增加forward 部件对性能的影响左右分别是采用forwarding和没有采用forwarding部件的统计结果

17、Total:95 Cydefs) Mecuted.ID ewejuted 创 E2 惟tu亡tbh(虬2 hstmcticnU) currint in Pipeline.eonf iguration:Mwnor; $ise. 3?7S8 Bvces facdE:-Shges: 1icqured Cydes: 2JmulEXtages ” .性qut開 Cycled! 5 fdvEX6tflges: 1, reqiied Cycles 19Fcnwarding enabledStallsFW .Id s: 10(10.53: of sll Cpclej. thereof: LDslK2(200D

18、SofRAWst Bnch/Jump atellc. 2 (20LOtK of FtV 创alls Fltalrg point 也 Ik G (ED.OOIK of RAW stallsWWE:O(Q0(Ktf dCpdes)Sthctural stalls 0 10CK ol al Cycl&s) Control slak 9(97 of al Cycles Trap sialk:12|12.&3. of al QdeslTotek 31 5)132-63 (rf J Cjcles)Conditional Branches):Tdtal: ? 11,2% of all hst

19、wclicns), lheoof; tat.en 2 (29.5? cr all cotid. Eiancies. not taken: 5 (71一烘 ol ail coni Bond)國Ltari-/S t ee-1 rs t r-uc t iona:Total: 12 (19 口弦 of allntuctbnsl,thoeot Loads: 5 (5100 M Load-JStoie-lntructiofis) Storw: & 阳 0 咲 of Load-/5tofe-lr£trLittiwi5)Floating pciiLt 古tage instzuctions;T

20、 atal: 9 (14.52 M 吕II1 於hucticfts, thetsaf: Aoditims: 4 (4< 44 erf Floating point stage inst Mulbplcatinns: 5 (E5.56黑 d Floating pdnl stage in± Drisionsc 0 (0. DOM of Floating point stage inst)Traps:T rape: 4(&朋 of all IrkuctionsjTotal:112 Cydeh ex BCdted ID executed by b2 Instruc:Horis.

21、2 Instfuctiarifsj cuheh加 in Pipeline.Hardware conf iguzra't icn : Memoriisize: 32768 Bpte$ faddE-Stages: h required Cycles: 2 FmiuIE -Slages: ' , lequired Cycles: 5 fdivEX-S tages: 1, required Cdes: 19 Foiwardn disabled.Stall®: AAW 丰talk; 28 (25.00 of all Cycles) VaWM 血险 0 0.00 of mil C

22、ycles) Stnxtu制 stalls: 010.00 of dl Cycles) Control stals; 9 (8.04 of al Cycle;) 7 rap stalls: 12(10.71 of al Cycles) Totdt 49 SkM() (43.75 of al Cycles)Conti it i ona 1 Eranc hes): Totdt ?(11.39窩 of all Instructions), thereof: taken: 2 (28 57 of all cond. Branches' net taken: 5 (71.43X of al co

23、od. Brnctec)Load -yS t Gfi?e I ns t xaic 11 oils ;Totafc 12 (19.35 cf all Instiuctions), tliefeof: Lc起戮 E (5 .D:i工 of LMd-$(or&-| nstrurtixi?) Fiores; G (50,00 d Load-7Sbre-lnshuctions)FlQ&ting point stejs ijistruct ioas: Tout 9(1 f.82Z of al Mructionsl thereof:Additioris: 4 (44 44誥 of Floatrigpoint toge instj MMtiph諭on首 555 5软 of FloaAvig port 出曲目 in就) DMsiom: 0 0,003i of


