北邮计算机系统结构试验报告-试验一到五-WINDLX模拟器_第1页
北邮计算机系统结构试验报告-试验一到五-WINDLX模拟器_第2页
北邮计算机系统结构试验报告-试验一到五-WINDLX模拟器_第3页
北邮计算机系统结构试验报告-试验一到五-WINDLX模拟器_第4页
北邮计算机系统结构试验报告-试验一到五-WINDLX模拟器_第5页
已阅读5页,还剩39页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

北京邮电大学实验报告课程名称计算机系统结构计算机学院03班王陈(11)目录实验一WINDLX模拟器安装及使用错误!未定义书签,实验准备错误!未定义书签,实验环境错误!未定义书签,实验步骤错误!未定义书签c实验内容及要求错误!未定义书签c实验过程错误!未定义书签c实验总结错误!未定义书签c实验二指令流水线相关性分析错误!未定义书签,.实验目的错误!未定义书签,实验环境错误!未定义书签,实验步骤错误!未定义书签c实验过程错误!未定义书签c实验总结错误!未定义书签c实验三DLX处理器程序设计错误!未定义书签,.实验目的错误!未定义书签,实验环境错误!未定义书签,实验步骤错误!未定义书签c实验过程错误!未定义书签cA.向量加法代码及性能分析错误!未定义书签。B.双精度浮点加法求和代码及结果分析错误!未定义书签。实验总结错误!未定义书签c实验四代码优化错误!未定义书签,.实验目的错误!未定义书签,实验环境错误!未定义书签,实验原理错误!未定义书签,实验步骤错误!未定义书签c实验过程错误!未定义书签c实验总结+实习体会错误!未定义书签c实验五循环展开错误!未定义书签,.实验目的错误!未定义书签,实验环境错误!未定义书签,实验原理错误!未定义书签,实验步骤错误!未定义书签c实验过程错误!未定义书签c矩阵乘程序代码清单及注释说明错误!未定义书签。相关性分析结果错误!未定义书签。增加浮点运算部件对性能的影响错误!未定义书签。增加forward部件对性能的影响错误!未定义书签。转移指令在转移成功和转移不成功时候的流水线开销..错误!未定义书签。实验总结+实习体会+课程建议错误!未定义书签,

实验一WINDLX模拟器安装及使用WinDLX模拟器的结构和功能说明1.点击运行之后,会看到一个如下图所示的窗口它包括Register,Code,Pipeline,ClockCycleDiagram,Statistics,Breakpo访胺下来详细介模拟器的结构及各个部件的功能。窗口介绍Rigister窗口中显示的是各个寄存器的名称及内容。如下图:RIe典RIe典na力皂华mWcr>虫nfgu111atir堂gs沧可以看到寄存器中以十六进制标识,从上图可以看出各个寄存器中的内容。可以看到寄存器中以十六进制标识,从上图可以看出各个寄存器中的内容。=RjcpHlTrFC='owcacic:0^00000000F0-0F24=DZMAE=OzoocodoodH9=OhOOOOC000Fl-0r7S-DIR=tlEUULUCJO.£1.口」UbOODODODO叱0L'(IxnnUQQQQQKU-nwnnmnnnnciF3*mFZ7-Q占HUdxdoooddddR12-UxUUUQODUUFd-uF2B-DB-0x00000000RL3-0x00000000F”0DEHL=o^oooaoaoRigOkOOOOOOOO0F30=DETA=OkOOCOOOOO£15-OsOOOOOOdO打=0F3;l=0oicooaaoaaaRIB-0x00000000E■打0DKnTHT-nxrinnQoaniiKI"IIXIIIHIIIUIIIIIIF3-nQfFbk-0x00000000Rl-UrIIIJOIJOOQUFH.*II0(IkUULOljliuRM-UmUUUUO(JD(JFil->uDSDR-OKOoaaoaaaR2D-0x00000000Fl2=0DErEBI-OxOOCJZJCZ玄”二0^00000000Ell-0DIO-DIX温工OaOOCOC30:R22=-0x00000000F14=0D12=DWRHi-nKfinnooQooH?3-nwnninnnnnoF15-noi*--QEO-OKCiooaaaooOnOOOOOOOOF16-0rife-0Rl-UkUUIQLjqoR25-UmUUUUO(JD£iFl?->u111HaDS?=OKQoaaaaaaR晔nxnnnnnnnnFia=nmn=nS3-cxoocaoooaR2?-0^00000000F14-0『产"'一DR4=oxooaaoaaaR20-0x00000000F20=0D24=DR5-OKOQDacaaaR23-QbDDUOUUUUF订*QDN白1■DRfc-oxciooaaoooR3D-O^OOIOOOOOOF22-0D2B-DR?-UkUUIQLJQOR31-UxUUUUUULjCF23->uUJU-D窗口介绍在没有进行任何执行的时候,初次打开code窗口,即为下图所示WINDLXFileWindowEjCMJteMemoryC&n-figurationC&deCode—nSTEXT0x00000000nop$te<t与泗OkOOO00000nop$TE<r+%B0x00000000nop$TEXTOxc0x00000000nop用口J舟d。0x00000000nopSTEXT4a14OwOaOCKOODnop&TEXT*(h18o>oaoooooonop鼾口J短很OmOOOOCOOOnop灯EXT短n200x00000000nop$TEXT+*240x00000000nop▼窗口现实的信息是各个存储器内同。第一列标识存储器的地址;第二列是机器代码,用16讲制表示:第三列是汇编指令。当我们点击上方的£券川恒,可以选择单步或多步执行(也可以使用快捷键F7或F8)。若选择单步执行,每按一次F7,指令执行一次,可以看到,一次执行的为IF->ID->intEX->MEM->WB没执行一次还有颜色的变化。颜色是用来标识指令处于哪个流水段的,如下图。阿Code■一口‘打EXTQmOOOOOOOOmop▲$TEXT+0k4OwOOOOOOOOnapJTEXTtOxfi0x00000000WBnop$TEXTM归OmOOOQOOOOmemnop|$TEXT+OxlOOmOOOOOOOOintEXnap$TEXT*0d40x00000000ICnop$TEXT*M3OkOOOQOOOQifnopSTEXT+OkIcOmOODOOOOO忤叩$TEXT*0x2Ci口刈口口口国工|nop$TEXT+。立4OmOOOOOOK口叩▼当然,我们也可以使用多步执行,按快捷键F8,选择5步流水,即可。窗口介绍通过I阅读WinDLX模拟器说明书可以知道,Pipeline窗口显示的是DLX处理器的内部结构。窗口用下图标识DLX五段流水。当然,如同Code窗口介绍讲述的那样,不同的颜色显示了指令处于哪段流水线。使用快捷键F7单步执行,可以明显的看出,不同时候流水段执行的不同指令。如下图。

IFIDi心MEMWBIFIDi心MEMWBlnlSUgp;laddE^MIE乂fdivEX图片反映的正式与Code中所处的时刻相同的指令流水。可以清晰看到不同流水段执行的是哪条指令。CycleDiagram®口实验准备中我们已经知道,该窗口显示的是流水线的时空图。时空图反映的是不同时隙内的运行情况。如下图。Iristinjctions/CyctesIristinjctions/Cyctesnopnopnopnopnopnopnop右I,5।「口在我看来,时空图是最好理解的。因为它反映的就是流水段的并行程度。在这个DLX模拟器中,并不存在一些数据或者控制上的冲突问题。所以可以依靠上图很清晰的看到指令所处的不同流水段,及指令执行情况。该时空图同样也是和前面的Code等相对应。也可以通过快捷键F7来进一步执行指令,可以看到流水线时空图的扩展情况。任意双击指令的一行,可以详细看到不同流水段的情况。如下图所示。

InformationaboutnopnopIFIDAdr-$TEXTCode-OmOOOOODOOTernnaledsuccessfiJIyFirstCycle;Lst3G底2TotalCycles:5Cycle5;-6(1]TernninatedsuccessfullyIMAflx-PC[-tTEXT)[=0x00000000)尸口后匚M(=1TEXT*0x4)NoStalklecfuired.Sde矶-5[1jTerminatedfiuccessfiu^jA<R0[=0nO)MoStailsrequiiedintEXMEMWBChicles:4(1]TermriaUd$uccES?fuWALU<^<0[-Orf)]NoStailsrequired.NoForwarding.Cvclec:-3(1)TernninatedsuccessfullyNothingtodo.NoStalblecfuiredS*次2(1)Terminated$ucces4u"R0<4LU(-0s0]NoStm临required.窗口介绍该窗口是对运行程序中的数据进行分析。主要包括模拟器中硬件配置情况,在该窗口中,我们可以比较不同配置对于该模拟器的不同影响。如下图所示。1)整体指令执行情况Total:IDeraxtedbp6ln£truction(5].51nilruction(s)current\yinPipeline.硬件配置情况Hardware:i5uraLt.LBii:Memorysize:32768EytesfaddEX*Siages:1,required匚如Ie<2fmulE乂-,加口££:1」eciu住d匚ycler5fdvE^-Stages:.1,requiredCycles:19Forv«rdingenabled.暂停次数和百分比及原因分析Stalls:RAWsials:0[U.OOSiofallCycles),thereatLDstalls::0(QOO^ofRAWstalls)Branch/Jumpstalls:0(HOO^ofRAW对/闷Fl向ingpointstalls;0(OlOO^ofRAW盘HkJWAV/就ak:0(0.00^WdlCycles]Structitalstalls:0(0.00^tAallCycles)Controlstalls:0。口喷出allCycles]Trapslak0(0l00^ofall口cks]Total:0StalhKO.OO^ofdlCycles)4)分支次数和百分比CandLlionalBranches:Totafc0[0,00^dalInttiuElioHt],tlwMhtaker:Li(I.UL;uhallccidE^rcke;riottakan:0(0,Q0":ofaltord.Branched5)Load/Store指令执行情况Load-/Store-Instrueticns:Total,0(0.00^ofalHrtebuctions],^hereoJ:Loads:Q[OLOO^olLoad/Stoie-lnswuctioriijStores:0(0.00^o(Load-/5tore-Iristructions;]6)浮点指令执行次数和百分比Floatingpointstageinstructions.Total:010一口口之ofalInstructions),thereof:Addibon^:0(0.005SMFloatingpointstageirsl)Multiplications:0(OLOOSSofFloatingpoint总加9&irt&t.JDivisions:0(J,0(J>cofFlo白lingpoint总加第描口trap发生的次数和百分比Traps:Traps:0口.口。胃ofallInstructions)窗口介绍该窗口使用来观察代码运行情况。先打开Breakpoints窗口,点击窗口上方的"2kMM5来设置breakpoint,也就是设置指令运行到流水线的哪个阶段程勋停止执行。如上图,如果选择EX阶段,在Code窗口中相应的行会出现BEX即指令执行到译码结束执行开始的时候,程序将中止・实验总结通过本次试验,由于是第一次接触DLX模拟器,该试验能够帮助我对这个模拟器大致的功能及使用做个大致的了解。对于日后的实验打下好的基础。我觉得WinDLX模拟器小而精悍,它有不同颜色的标记,不同寄存器及存储器的反映。通过使用它,可以对5步流水的过程及不同阶段很清楚明白的看到。也可以看到不同指令分析走到了哪一步,到了哪一步流水段。实验二指令流水线相关性分析•实验目的通过使用WINDLX模拟器,对程序中的三种相关现象进行观察,并对使用专用通路,增加运算部件等技术对性能的影响进行考察,加深对流水线和RISC处理器的特点的理解。・实验环境WindowsXP操作系统WinDLX模拟器・实验步骤.观察程序中出现的数据/控制/结构相关。指出程序中出现上述现象的指令组合。.考察增加浮点运算部件对性能的影响。.考察增加forward部件对性能的影响。.观察转移指令在转移成功和转移不成功时候的流水线开销。1.观察程序中出现的数据/控制/结构相关;指出程序中出现上述现象的指令组合。1)数据相关如下图所示,在ClockCycleDiagram®口所想是的时空图中和Pipeline窗口中的流图中,第一次出现了R-Stalb

接下来可以点击上图中的橘色窗口,则屏幕显示lbur3,0X0(r2)要在WB周期写回r3中的数据;而下一条指令seqir5,r3,0xa要在intEX周期中读取r3中的数据。上述过程发生了WR冲突,即写读相关。为了避免此类冲突,seqr5,r4,0xa的intEX指令延迟了一个周期进行。由此,相关指令为:inputL&op0x90430000Erl吧国.0x00000179OxEaGSOOOaMEM$eqi(5/32)控制相关

CkxlcCycleDiagram;第二条命令CkxlcCycleDiagram处于intEX段;第三条指令出于aborted状态;第四条命令处于IF段。原因分析:jalInputUnsigned是无条件分支指令,但当第三个周期开始的时候,也就是jal这条指令被译码后才知道。此时,movi2fp已经执行,且将要执行的下一条命令在另外一个地址处,所以这条指令不会执行,这个时候就会发生控制相关。由此,发生控制相关的指令为:BBItTEXr0x20011000intEXaddirlMOkIOOOmoin+CMOhOcOOOOScIDialInputlUnsignedmain+0x80^00205035IFmowi2fpf1Ojl000001440xacQ21094IF"SdveFl2(rO)j23)结构相关首先,我们先来看一下执行过控制相关的时空图和Pipeline,如下图。

如下图:当我们点击Pipeline中IF所对应的框框可以看到详细的该指令执行情况,如下图:pelIFCycles:-2(3)InPipelineIMAR^PC(=riput.Loop+0Kl3)IR^MenflMAA](-Om2042QOD1)PC<-FC+4(=ir-ifnjt.Loop-»Clx1c)2Stall[s]because2stiuoturalHazard!上图表明了addir2,r2,0xi的详细信息。该指令与它前一条指令addr1,r1,r3发生了结构相关。并且由于此处的冲突,需要暂停2个周期。在ID段暂停后,则开始进图intEX段。所以这条指令(addir2,r2,0xi)你不能进入ID流水段,译码部分占用,发生了结构相关。该部分的指令为:0x000001周0x00230830addiUTrlOmODOOO!8c0w20120001addi(2j2J0m12.考察增加浮点运算部件对性能的影响。该实验取N=6首先通过-onfigurotiori,点击FloatingPointStageConfiguration来设置浮点运算部件的配置。由于实验手册上面要求Delay=4,所以我们将Delay这一栏改成4,而Count可以任意,为了对比,我们第一次浮点运算部件取全部为2,第二次浮点运算部件取全部为3。如下图所示:1FloatingPointStaQCConhqurationFloatingPointSt^g史Configur3tnnaCount:De国Lount:Delay:AdditionUnits:MdtiplicationUn<£DivisionUrih:24AdditionUrih;MultiplicationUriih:DivisionUnils:342432rj+NumberofUr<sineach□流吸1<=M<=8,Delay(DockC/cl&s]:1<=N<=50WARNING:1fyoucharigethevalues,thep1口ce/willbe(sset3utcnn日tic•帆NumtefofUnitsineach□ass:1<=M<>8,Del^y(DockCyclesl:1<=N<=50WARNING:Ifyouchanoethevalues,theprocessorwillberesetautumaticalp!|_"Cancel"」lCancel运行50个cycles之后,可以看到他们数据的对比:StatKtksTotal:50Cpcleh)ewecuted.IDeneci*E(ibv32Inslructicri(s).4In£huctiort^)Current^inPipeline.Hardwareconfigtir■己tion:Merriotysize:327GSBytesfaddEX-Stages:3,requiredCycles:4fmulEM-Stages:二,requiredCydes;4FdivEX-Stages:9,requiredCities:4ForwardingenabledStalls:RAWstals:9(13.00^ofallC/cles],thereof:LDstalls:2(22.22加ofFLAWstals)Bianch/Jimnpstalls:1(22.2丝d山M城北〕Floabngpointstalls:5(55.5B.:.ofRAWstalls]WAWstalls:0(OlOO^ofallCycles)Structural0(0.00^ofallCycles:)Controlstalls;4(8.0D^口fallCycles)Trapstalls:G02.00%ofallCycles]Tdtai;19Stall*[38l口口影oFallCycles)Cor»ditionalBranches):Total:2[6.25翼ofalInstBuctionsjLthereof:takeri:1[5D.OQ^qFallcond,Br*ch州)nott-aken:1(50.00^ofalcomdBlanches]Load--^5tore-Lnstruetions:Total:11(34.3S?;口falInstructions],tliereof:Loads:61545唳ofLoad/Store-lnstiudions]Stores;5(45,45^ofLaad-/Store-lnstmetions)Fl口确tingpointstsgeinstructions:Tefal:1912,ofalInitrucriansjLthereof-OdKom:0(0.00^ofFloatinQpointstagerist.)M曲曲atio幅1tIOQ(舲ofHoa$gpant辛taghst)Divisions:0(V,00篇ofFloabngpointskg]Traps;Traps:2(^,25^ofalllnsliuctions)

ElElTotal:50Ci*cle(s)executed.IDexecuted助32Instruction(s).4lnstiucliori[s|currentlyinPipeline.HardvacTBconfiguration:Memorysize:327G8RylesfaddEX-Stages:2tlequiredChicles:4fmjIE^-Stages:requiredCycles:4Fdr.'E^<-&tages::2.requiredCycles:4Foiwcirdrigenabled.GtailLs:RAWstalls:9(18,00%of8口clesLthereof-LD虱#2(22.22与。fRAW港均BisnchZJunnDstals:2(2222"ofRAW盘mlk]Floatingpointstalls;5〔55.56芯ofRAWstalls)WAVstalls0(0.00^of€llCycles)Gtuduralstalf叫口圜胃ofalC^clei)Controlstab:4(U00落ofallCjFclesJTrgp箱Ik;6(12-00^q1MC^gI&s]Totatl9Sia*](38,00^,ofalCycles]CcnditionalEranches);Total2(6.253;ofalIn&budionsLthereof-taker;1皿口叫ofallcond.Benches]notkkan:1[50.00^oFdlcondBranches]Loaid-/^itnore-Instruetioqs:Total;11(34.39^ofalInsliucHon礼thereof:Loads:E[5』5WXoFLoack/Slorelnsbuctiari^]Stores5(4545落ofLoad'/Stoie-lnsiructions]Floatingpointst^gsinstractions:Taiak1(3.12%ofalHbuctionsLtheraaf:4ddstion黎0(0.00S»ofRoahngpointstageinstjMultiplcations:1(100.00^ofFlowingpointstagem$L]division5:0(O.CO^ofFb成irygpointstageinsI)由此可见,浮点运算部件的增减对效率无影响。比较各个数据,发现没有变化。无论怎么增加浮点运算部件,统计结果都一样原因在于此程序中浮点计算化。无论怎么增加浮点运算部件,统计结果都一样原因在于此程序中浮点计算指令没有重叠,所以并行度没有增加,性能没有提高Corifigurabon中勾选enable3.考察增加forwardCorifigurabon中勾选enable为了对比有无forward部件的性能。需要在forwarding,以及不勾选enableconfiguration来看性能数据的对比。不使用forward部件:5tatKtic&可5tatKtic&Total:50匚必出]eneeuted.IDexeciMed加27In由uction闺.4Instrucliort(s)current^inPipeline.Hardwareconfiguration:Meffioiysiift:22760BytesKaddEX-Stage^Xrequir&dC^elei!4fmulEX-Slages:3_requiredCycles:4FdivEX-Slages:3,requuedCycles:4ForwaidingdisabledStalls:RAWstalls:13(2S.OO^ofall匚gleNWAWstalls0(0.00^ofall口des]Structuralstalls:Q[0,00^0fHICycles]Controlstab:3(6.00%of1alsdies]Trapsials:6(12.00^ofallCycledTold:22Sld(>][4400^ofallCycles]使用forward部件:statKtksTotal;50匚州他阂e^MUtedIDfiKecited皿骁lmtnjdtion[x].41rl苕truGli口n(宇jcurrent^inPpdire.Hardwarecon£iguration:Memorydze:颦7战G^ytesFaddEX-Stages:工reqiireclCycles:4fmulEX-Stagesc3,requredCletes:4fdivEX-Stages:3,reqiiedCj»cIesc4Forwardingenabled.Stalls:RAWs:tals:9(ISCO笈ofallCycles],thereof:LD羽怆2[22.22%ofFLAW咐rl「=irrh'Ii.f「干,引卜■[...cF:'ah|Floatingpointstalls:5|55,5G笈ofFEAW整wlk]WAWstJs:0(0.00^ofalCycles]Structural0[0.00^ofall加les)CoMid节后隔4(8QQ%dNlCaI^s)TrapWalk:£[12Q口室ofallCj)des|Total;ISStallkl阳口咦甘刮Cycles]从上面的数据我们可以看出增加forwardi部件后RAW由原来占总时钟周期的26%减少至18%,RAW个数由原来的13减少至9。增加forward部件使得控制相关比例增加了。即,使用forward部件后,总的时钟周期减少,数据相关减少,流水线的性能得到一定的改善。3.观察转移指令在转移成功和转移不成功时候的流水线开销。我们假设,浮点部件设置Count=3,Delay=4;N=6执行完毕后,查看条件转移分支,如下图所示:CcndttionalBranches):T口晅I:El[12.12ZofallInstrLJctions),.frieieof:taken:2(25.00^ofallccnd.Brarches]nottoken:G[冯0咤ofdcondBrandvM由上图可知,转移指令一共8条,成功转移2条(占25%),不成功为6条c所以,静态指令调度算法只能解决数据相关,条件转移结果与原来相比没有变化。即,若转移不成功,对流水线的执行无影响,流水线的吞吐率和效率没有降低;若转移成功,则要废弃预先读入的指令,重新从转移成功处读入指令,执行效率会下降。实验总结本次试验中,主要遇见一个问题,就是在当初文件加载时没有成功,后来通过查询资料和自己的尝试,发现,在选择文件的顺序很关键,它决定了文件在存储器中出现的顺序。本次实验,主要通过对于三中相关的观察,分析出现相关时的指令,分析浮点运算部件和forward部件对性能的影响,观察转移指令在转移成功和不成功时的流水线开销,这些实验一步一步,通过WinDLX形象生动的表示,使我在实践中更加深入的认识了流水线。实验三DLX处理器程序设计实验目的学习使用DLX汇编语言编程,进一步分析相关现象实验过程A.向量加法代码及性能分析首先给据题目要求,需要熟练掌握DLX编程语言,然后根据规范格式编写向量的代码。1)向量声明VectorLength:.word16ord1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16ord1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16Result:.space4DLXStandardl/OL一…郎,二Vector-2.0000004.0000006.000000800000010700000012.00000014.00900016.0(000018.00000020.00000022.0000002400000026.00000028.0000003O.OO9OOO32.0(0000StatisticsTotal:283Cycle($)executed.IDexecutedby181ln$Uuc»ion(s).2Instiuction(s)currendyinPipeline.Hardwaireconfiguration:Memorysize:32768BytesfaddEX-Stages:1zrequiredCycles:2fmulEX-Stages:1,requiredCpcl&§:5fdivEX-Stages:1zrequiredCycles:19Forwardingenabled.Stalls:RAWstalls:32(11.31^oFallCycles),thereof:LDstalls:0(0.00%ofRAWstalls)Branch/Jumpstalls:16(50.00^ofRAWstals)Floatingpointstalls:16(50.00XofRAWstals)WAWstalls:0(0.00%ofalCycles)Structuralstalls:0(0.00^ofallCycles)Control$UII5;15(5.30^ofallCycles)Trapstalls:54(19.08%ofallCycles)Totat101Stall(s)(3570^ofallCycles)Conditiona1Branches):Totat16(884%ofalInstructionsLthereof:taken:15(9375Zofallcond.Branches)nottaken:1(6.25%ofallcondBranches)Load-^St-ore-Instruetions:Toa49(270笈ofallInstructionsLthereof:Loads:33(6735力ofLoad-/Store-lnstructions)Stores:16(32G5^ofLoad-/Slore-1nstruclionsJFloatingpointstageinstructions:Totat16(8.84^ofallInstructionsKIhereof:Additions:16(100.00^ofFloatingpointstageinst)Multiplications:0(000^ofFloatingpointstageinstJDivisions:0(00(KofFloatingpointstageinstJTraps:Trap5:18(9.94^ofallInstructions:)RAWstalls:32(11.31%ofallCpcIesbthereof:addi(14j0,0x1098trap0x5IFIT-StallIIDIntEX1MEMIWBI

Informationabouttrap0x5trap0x5IFIDAdr:Fini$h*0x8Code:0x44000005TerminaledsuccessfullyFirstCycle:-12LastCycle;-5TotalCycles:8Cycles:12⑷IernninatedsuccessfullyIMARuPC卜Finish十。xS)IR<-Mem[IMAR](=0x44000005)PC<-PC*4(=Finish*0xc)3Slalljs)becauseofTrap-Pipeline-Clearing!Cycles:-8(1]TerminatedsuccessfullySystemcalexecutedNoSEXMEMWBCjictes:-7(1)TerminatedsuccessfullyNothingtodo.NoStallsrequiredNoForwarding.Cycles:-6(1)IernninatedsuccessfullyNothingtodo.NoStallsrequired.Cycles:-5(1)TerminatedsuccessfullyNothingtodo.NoStallsrequired.rok^il・口1ITotR1.:▲283Cyclc(s)executed.IDexecutedby181ln$ULcdon(s).2Instruction(s)eurrentlpinPipeline.Hardwareconfiquration:M「mcrysi7pRytifoddEX-Stoges:1ZrequiredCycles:2fmulEX-Stages:1zrequiredCycles:5fdivEX-SfagM!1,r^quir^d19roewordingctioblcd.StalIs:RAWstolk:32(11.31%o:allCycl«)zthereof:LUstalls:□[U.UUNofHAWstalls!RrAnrh/Jumpdk,1R(SU(ID幺ofRAWctMc)riootingpointotolls:16(50.00^ofRAW“口1$)WAWstalk:01U.UU%ofallCycles)Structural^talh:0[0CIO幺erfallCycles)Cuiitrulslolls15(5.3OS5c/allCyczle^JTrapscans:34(13.06^orancycles)TotAb101Stall(t)C3570Xofall加“)ConditionalBranches):Total:1€(9.94?JofalIwlructions)^therTOf:taken.15(33.75%uloilgnd.Dianclie^Jnottaken:1(5.25zJCtallcond.Branches]Lootd-tore—Instxuetions:Iotal:48(2/.U/Nofallln$t(uction5|zthereof:Load^:23(67.35^ofLoad-/Stor©-Inchaction®)S(o<cs.1C(32.G5力ofLuad-/Stoio-Instruclion^)FLoaiti.ngpoint三tagein©truotxonu:ToloL1G(8.04^ofalIn^lrucGons)^Ihcicof.Additions:lb(lUU.UUNolhloatmapointstagemsl)MultipScdtion^:0[0.00%ofFloatingpointMag❷inctjDivisions:010.00%ofFloatingpointslcg。in*JTrap©:Trops.1。(9.94zio(oilIn5Uuction5JControlstalls:15(5.30^ofallCycles)Statistics-HlStatisticsTotal:381Cycle(s)executed.IDexecutedby181Insbuctionfsl2Instiuction(s)current^inPipeline.Hardwareconfiguration;Memerysize:32768BytesfaddE><-Stages:1zrequiredCycles:2fmuIEX-Stages;1,requiredCycles;5fdivEX-Stage§:1requiredCycles:19Forwadingdisabled.Stalls:RAWstalls:130(34.12%ofallCycles)WAWstalls:0(0.00^ofallCycles)Structuralstalls:0(0.00^ofallCycles)Controlstalls;15©.94%ofallCycles)Traprfalk:54(14.17^ofallCycle*)Total:199Stall(S)(52,23ZofalCycles)Conditiona1Branches):Total:16(884^ofallInstructionsLthereof:taken:15(93.75务ofallcond.Branches)nottaken:1(6.25Xofallcond.Branches)Load-/Store-Instrue11ons:To以49(27.07之ofalInshuctionsLthereof:Lo-ads:33(67.35^ofLoad-ZStore-ln?Uuction?)Stores:16(32.65%ofLoad-/Store-Instructions)Floatingpointstageinstructions;Total:16(884^ofalllnsfruction§)zthereof:Additions:16(10000^ofFloatingpointstageinst)MuHiplicaJiorts:0(0.00XofFloatingpointstageinstJDivisions:0(0.00%ofFloatingpointstageinst.)Traps:Traps18(3.94%ofallInstructions)IrapsLTraps::18(9.94NofallInstructions:)ConditionalBranches):Total:16(884%ofallInstructions),thereof:taken:15(93.75%ofallcond.Branches)nottaken:1(625^ofallcondBranches)ArtOIS*®&式◎耳营匕带物左a:.double.3.7.^..H...11.2,12.3.13.^.1*..1«.9.2b:.doublr1.1v2.2»3.3,li.49S.Sv6.6,7.7,8.899.8v1O.9v11.tv12.7y13.6v1M.9v1S.1iv16.3,17.2,1t.1,19.9,20.aPrintfForiwt:.ascii?"TUPresultIs\n\ntf\ttf\tV\tlf\ttF\tV\ttF\tV\U^\tV\tXr\tV\tV\t^f\tV\tV\tX\t\nW.411yn2PrintH1^:.vordPrintFFornjtr:.space200-tMt•glob<ilzinnain:aMirl.rO.Gadduiri,re.sloop:sublr2,r1r2BtiRqzr2.n«i>»h|nulturB.rl.r*Idf0.a(r3>IdF2,b(r3)add。01,11137sdr(ra),FMaddirl,rl»1Jloopadd!rl^.re.PrintfPartrapS5D,318■DLXStandard-I/Oi乎jTheresultis3.00000013.00000023.00000033.0000004.3000007.0000009.00000011.00000015.00000017.00000018.90000021.00000025,00000027.00000029.00000031.00000035,00000037.00000038.90000041.000000Total:474Cycle©executed.IDexecutedby186lnstruction(s).2Insiruction(sjcurrentlyinPipeline.Hardwareconfiguration:Memorysize:32768Byte5faddEX-Stages:1zrequiredCycles:2fmulEX-Stages:LrequiredCycles:5FdivEX-Stages:\requiredCycles:19Forwardingdisabled.Stalls:RAWstalls:263(55.4%ofalCycles)WAWstalls:0(QOO%ofallCycles)Structuralstalls:0[Cl.00幺ofalCycles]Controlstalls:21[4.43^ofallCycles)Trapstalls:3(0.63^ofallCycles)Totat287Stall(s)(60.55^ofallCycles)ConditionalBranches):Totat21(11.23%ofallIinstructions),thereof:taken:1(4.76^ofallcond.Branches)nottaken:20(9524%ofallcondBranches)Load-/Store-Instruetions:Iotal:60(32.26%ofallInstrucdonsLthereof:Loads:40(66.G7/CofLoad-/Store-1nsiructions)Stores:20(3333^ofLoad-/Store-1nstructions)Floatingpointstageinstructions:Totat40(21.50^ofallInstructions^thereof:Additions:20(50.00%ofFloatingpointstageinst)Multiplications:20(50.00%ofFloatingpointstageinst.)Divisions:0(000^ofFloatingpointstageinst.)Traps:Traps:1(054^ofallInstructions)FAWstalls:263(55.48%ofalCycles)•21].30।・19[18]17।・16]15「1411-121T1]加।§ldK2JXi3|•dddKjg.Esd<r3V<adSifl,t1.Dr1ItopaddiriaiOLMISe心i2i1M4immiTTWlIinaanKOIStatisticsTotal:474Cycle(s)executed.IDexecutedby186lnsbuction(s).2Instruction(s)current^inPipeline.Hardwareeonfiguration:Memorysize:32768BytesfaddE^-Slages:1,requiredCycles:2fmulEX-Stages:1,requiredCycles:5fdivEX-Stages:1reqiaredCycles:19Forwardingdisabled.Stalls:RAWstalls:263〔55.48%ofallCycles)WAW$闾s:0血00%ofallCycles)Structuralstalls:0(0.00XofallCycles)Controlstalls:21(443^ofallCycles)Trapstalls:3[063%ofallCycles)Iotal:287Stall(S)(60.55之ofallCycles)Conditiona1Branches):Total:21(11.29^ofalInstructionsLthereof:taken:1[4.76^ofallcond.Branches)nottaken:20〔9524%ofallcond.Branches]Load-/Store-Instrue11ons:Tot-al:60(32.26%ofalInsbuctions),thereof;Loads:40(66.67^ofLoad-/Store-lnstructions)Stores:20(33.33^ofLoad-/Store-Instructions)Floatingpointstageinstructions:ToUl:40(2150%ofalInsbuctions),thereof:Additions:20(50.00^&Floatingpointstagein§t)Multiplicalions:20(50.00右ofFloatingpointstageinst]Divisions:0(0.00%ofFloatingpointstageinst.)Traps:Traps:1(0.54^ofallInstructions)StatisticsTotal:474Cycleh)executed.IDexecutedby186lnst(uction(sl2Instrucbon(s)currentlyinPipeline.Hardwareconfigura^ion:Memorysize:3276。BytesfaddEX-Stages:4ZrequiredCycles:2fmuEX-Stages:4ZrequiredCydes:5FdivEX-Glages:4^requiredCycles:19Forwardingdisabled.Stalls:RAWstalls:263(55.48Xo:allCycles)WAWstalls:0(0.00^ofallCycles)Structuralstalls:0(0.00^dallCycles)Controlstalls:21(4.43XofallCycles)Irapstalls:3[0.63%ofallCycles)Total:287Stal(s)(GO.55幺ofalCycles)ConditionalBranches):Told.21(11.29幺ufalIrwliuclions),thereof.taken:1(4.76Xofallcond.Branches)nottaken:20〔95.24%ofallcond.Branches)Load-/Store-Instrue11ons:Total:60(32.26然ofalInsirucbonsLthereof:Loach:40(66.67%ofLoad-7Store-lnstruction5)Stores:20(33.33^ofLoad-/Store-Instfuctions)Floatingpointstageinstructions:Total:40(21.50之ofallInshuctionsLthereof:Additions:20(50.00%ofFloatingpointstageinst)hall出■rJtcnhccG。JIlElIIIz'c£flcnhccccg卜a卜ncaIStatistics-口|StatisticsTotal:474Cycle($]executed.IDexecutedby1E6Instructionfs).2Instruction(s)cuientlyinPipeline.Hardwareconfiguration:Memorysize:32768BytesfaddEX-Stage$:1ZrequiredCycles:2fnxiIEX-Stages:1zrequiredCycles:5fdivEX-Stages:L-equiredCycles:19Forwardingdisabled.Stalls:RAWstalls:263〔55.48乞ofalCycles)WAWstalk:0(0.00^ofallCycles)Structuralstalls:0(0.00^ofallCycles)Controlstalls:21(4.43%ofallCycles)Trapstalls:3(0.63zJofallCycles)Totol:R/W.ofallUyuHiJConditionalBranches):Total:21(11.29^ofallInstructions^thereof:taken:1(4.76^ofallcond.Branches)net3kerr20(AS24幺ofallcondBranches)Load-/Store-Instruetions:Iotal:60(32.26^ofallInstfactions)zthereof:Lo^d5:40(6667%ofLoad-/Store-ln$tructions)Stores:20(3133^ofLoad-/Store-Instractions)Floatingpointstageinstructions:lotah40(21.50^ofallInstructions)zthereof:Additions:20150.00力ofFloatingpointinst.)Mukipbcations:20(50.00NofFloatingpointstageDivisions:U(llUUNotFloatingpointstageinst.JTraps:Traps:1(0.54^ofallIn$tructions)Statistics一口|ITotal:352Cyclcfs)executed.IDexecutedby186Instruction^).2InstructionlsJcurrentlyinHipehne.Hardwareconfiguration:Memorysi7-327KSByt-faddFX-51,rp-qilirpd2FmulEX・Stag—:1,requiredCycles:5fdivEXStages:VrequiredCycles:19Fmwaidiiigcrtabled.Stalls:HAWstalls:141(4U.Ub^otalCyclesLthereof:LDstalls:20(14.182ofRAWstalls)BranchZJumpstalls:21(14.89^ofRAWsials)Flo由in。nointsials,100(7092^ofRAWWAW€Ulk:0(0.003;ofallCycles)Structuralstollo:0(0.00WofallCyclo#Controlatolls:21[5.96%ofallCycles)Trapslolk.3(0.85%ofallCyclos)Total165Stall(s)(46.88%ofallCycles)Conditional.Branches):Iotab21(11.2皖ofallInstruction$Lthereof:faker1(476幺ofallcondBranches)nottaken:20(952a幺ofallcondBranches)Lookd-z'Stoiro-Instructions:Tolol60(32.26^Sofallln^Uuction5)zthcrccrf;Loads:40(66.67>£ofLoad-/Store-1n$Uuction$)Sto(es:20〔3333^ofLoad/Store。Instructions)Floatingpointstageinstructions:Total:40(21.50^ofallInstruction$Lthereof:Additions-20(500敞ofFloatingpoint*h=>geinst)Multiplications:20(50.005^ofFloatingpointctagein6.)Divisions:0(0.00^6ofFloatingpointstageinet.)Tr«ps.Trap5:1(054/;ofallInstructions)匚口口日1,110]1己1Branches)二文件不要出现中文格式,不然会Total:Z1(1129^ofdlIrsimctionsLthefeof:taken:1(4.76^ofallccnd.B国口由时notkken;2口田524%of口||cord.Benchs]文件不要出现中文格式,不然会导致加载失败(开始好几次为了方便我直接起名为“双精度浮点向量加.s”却怎么样也导入不成功);在编写双精度浮点数运算时有些对于指令掌握不熟练,并且双精度double型运算指令,其所有的运算指令名称上面都要加上“d”才ok。而如果是单精度的,则需要添加字母“f”;其次,对于浮点数的相关设置,包括状态寄存器和浮点寄存器都需要在实验之前查资料了解透彻,不然在试验中就会有语法错误。通过此次实验我对实验二所进行的数据相关、控制相关、结构相关的性能分析做了更深入的了解,以及对于功能部件对流水线的影响,forwarding技术对流水线的影响,还有就是静态指令调度等。通过自行编写向量矢量算法,在代码中初始化两个向量,按照分量顺序进行运算。当然,如果想要改变源向量,直接处理代码中的相关数据即可。总之,该实验主要着重对浮点运算以及对于流水线的相关影响及性能分析,使我受益匪浅。实验四代码优化•实验目的学习简单编译优化方法,观察采用编译优化方法所带来的性能的提高•实验原理采用静态调度方法重排指令序列,减少相关,优化程序・实验过程选择上一个实验的向量加法运算作为优化对象。优化后的代码如下图所示当如下图所示的时候证明已经执行完毕trapQxO执行完毕后,我们点击Statistics查看运行结果数据分析StatisticsTetsl:31KCpcle(s)eweculedIDeweciMedby181InstrucUonfs].JInstiudionfsjcurrent^inPipeline.Hardwareconfiguration:Mernor^size:要在◎QytesMddEX-Stages:J,requiredCycles:_'hnulEX-St日g号£1.「匕quiredCydes;5hdivEX-Stajes:1.reqLv?dCycles:1HFowardingdi?aMed.Stalls:RAW5司室65(20.57^of曲Cyd&s]WAWstalls:□(0口口笈ofallCycles)Structuralstals:0(0.00案由浦口cleg]Controlstalls:15(475^ofallCycles)"rapst315:巧4(17,10?-ofallCycles)Total;134Std(s](42.40Xof=Cycles]ConditionalBranches);Total:16[8,84^;ofallInstrnetnns],thereof;taker:15(93.而芯cf日Icond.Branches]nottakerv1(62通of3IIccnd.Branches)tore-Instxmc*,□ns;ToUl:49(27.0721ofalInstmctionsjLthereof:Loads:33[67.35宣ofLoad'/^tore4nstructionis)Stores:16(32f5*:NLc^ddSt国金In耕ru瑞2m)Floatingpointst^geinstructions:Total:1G(8LB4^»oFalllrshuctiom]rthereof:由dd而on*:16(lOJOlKfHFloathgpoint狙驼imljMultplications:0(0.00^ofFloatingpointr$tjDivisions:0[0,00^ofFloatingpoint而g日inst.]Traps:Traps:16(9.94?::ofalllnstiucbcns|1)程序相关性分析结果优化之后其中断数据显示为:Stalls:RAW赳哪:65(20.57^oTM卬函*]WAWstalls.0口口腾ofallCycles)Structuralshit:0(0.00^ofallCycles)Controlsialt:15(4.753Sofall匚如感]Trapsulk:54(17102:of或匚眼依可Total:134SMlll(&)(42J0^ofdCycles)优化前为:StalLs;RAWstalls:130(341诧ofallCycles)WAW客闫上0(0.00^cFallCycles]Structuralstak:0[0.00胃ofalCycles)Controlstals:15(3.94^:ofallCycles)T中若tdh;54(14.17%of1■口⑶:1-N则II回归223^ofallCycled

由上述两图对比可以看出,数据相关:其RAW相关由优化前的%减少为%,性能改善很多;结构相关没有发生改变;控制相关:由原来的%变为。底没有改善。因此,可以看出,我所进行的代码优化对性能方面改善并不是很强烈,主要影响还是在数据相关方面。2)增加浮点运算部件对性能的影响。$tatKticsStatisticsTotal:StatisticsTotal:31ECi*cle(s)executed.Iexecutedb/1S1lnstructior|s).2Instructicnfs]ClirenHyinPipeline.Hardwareccnfiguiation:Memorjsize327G0BytesfaddEX-Stages:4,requiredCycles:2fmulEK-Stage14」requved送5fdivE^-Gtajes:4,requiiedCycles:l!rForwardingdisabled.St&LLs:RAWstafc:65(20.57S或allCycles)3Mstalls0[0.00^ofallCycles)Structuralstalls-0(0.01^ofallCyd]Ccnlrol就刮工15|4而看oJalCycle打Irap就JI军54(1710岩市allCycles]Trtk134Sta㈣[424motallCjrcles)ConditionalBxanches);lot^t16(8.84^ofalInstrudionsLthaeot后31T15(9375Kofallcond.Branches]nottdken:1(G,25^afalcond.Branches)tors-1nstxvgtiqn$;Total:49(2707^ofallInstruclio^thereof:L口司由33(^7.35XoFLoad>/Sl:oiB-lrislructicn^St。再$1$〔32£52QfL^d-Z^ie-IrntfuctiondFloatingpointstageinstructions:TH0:IE但.8峨ofallInsIrudionsVthereof;Addiliorfi:16[100.00^dFloatingpointdagsirst.)Hi..lH(:liebtierri'(i'i"i'ofHcaH'igDoini:"为eir?tIDivisions:□[0,00^ofFloatingpoinlstageinsL]Traps:Iraps:18(9.94^ofallInstrnctons)Total.31ECiielele)ejiecuted10CKCCUlCdby181Inslruction(s).2lnstrLichon(s]currentlyinPipehie.Hardwareesemfiguiat1an:Memorysize:327G5B^lesJaddE^-Stage5:1,feqiiedCletes:2IE^-Stages;1,requiredCycles:5IdivE^-Dlages:1,requiredCycles:19Fori^erdingdisabled.StalIs:RAW期收:65(20.57^ofHICycles]MAW5tMs;0而口口宏ofdlCycles),Structuralstalls:□(0XTofallCycles]Control^dls:15[4,75'ofallCycles)Tr■叩stalls_54[17.1(RofallCycles]Tdi134StalKs)(42.40^ofalCjvles)ConditianalBranches);Total:16(8.84^ofalInstructions),tliereot将Kune15[937旅q『□!!cond.Eranches]notUkar1(G.25/ofalcondBranchsc)L口曷己「/Stu工号一工mstHUGtiona;Totat49(2707^ofallInshuctionsJ,thereof:Loads:33(67^5^ofLoad-ZStore-1nstrudions)$0=16(32£骇3L»d7Store-In?lruction9]Floatingpointstageinstructions:Total:1E[9J4%ofalInstructions],,themhAdditions:IE[100,00,ofFloatingpointfUgerst]ML*ipications:0(0.00%ofFloatingpointstageimL)Divisions:0[00□君4Floa由gpointstageinsl)TrfipsTraps:16(9L94XofallInstructions)影响。原因为该运算过程中不存在结构相关,部件增加对于系统的性能并没有改善。影响。原因为该运算过程中不存在结构相关,部件增加对于系统的性能并没有改善。上图左图为4个浮点部件执行结果,右图为原始默认1个浮点部件执行结果。由此可以看出,其部件个数对统计结果并无因此并行度没有增加,程序影响不大,StatisticsTo.2B3Cvcle(^To.2B3Cvcle(^B>ecutedIDexecutedbv181In.ruction®MInstiuctiQnfjjcurrerf^inPipeline.Hardwareconfi.pu.r'ation,Memory好维32769BytesfaddEX-Stag&s:1,requiredCjjcles:2FmulLXGtogca.1,requiredCytlan:5fdivEX-Stages:1.requiredCycles:13Fon^ardingenabledSt-aills:RAWstalls:32[11.31^ofallCycfes].UneiEcrf:LDstab:0(0.00^ofRAWstalls)Branch/Jumpstalls:16(50.DOS,afRAWsialic]Floatingpoint■>■11:1£(50.00%ofRAWttalla]WAV/stalls0(0口旧用allCycle?)Structuralstah;□(000-ofalCycles)Controlstalls;1515.30ofallCycles)Trapst^ls:54(14加看ofalCvcles]Total:101Stall[s][35.70/.=.ofCvcleslConditionalBranches):Total:1£(S€4%of*11Irrttrciotiorkt],lhw*cf:taken:15(S375^□(allcondBranches:]nott^kenc1〔6,2说ofallcondBranches)Lcaud—.zJStore—t-qus;Total:49(27.07^ofalInstnjctionaLthereof:Loads:33(S7.35^ofLoad/Store-lnstriKtians]Stores.16(32.S5SolLoad-ZStore-1nstructionsjFloatingpointstageInstructionsiTotat1G(084^ofallInstructions],thereof;Add-ons;15(100CO矣ofFlMtirgpoint泣切曰in5t.]Mukiplicatiorts:0(口。口名cfFloatingpointstagerist]Divisicms:0[O.OCIMatFlo日tingpointstageinst.)Traps:Trapt1门巴Q4定dllIrtthuctiont)3)增加forward部件对性能的影响。SLamucsTotal:316C^cle($]erecuted.IDexecutedby1311n玳njcth□口[事]2lnslrucl:icn(^)cuireridyinIPi|pehne.ee-on£igua^ation:Monwrysiz^32768Ryla拿faddE*£twc»T1requiredCuctes::2FrnuEX-Stages;1..requiredCycles:5FdivEX-Stages:1requiredCycles:19Forwardingdis-abled.5LoJLL4.RAWitolls:65口口.57新ofdl□yclcs)WAW小融:0[000^ofall®*可Structuralsl,^lk-□(D00KoFaillCycles:)Controlstalls:15[4.75宠ofallCycles)Ilapstalk:54(17.1CUSofallCycdM]Total:134Slallfs][42,40^ofallCvcIesJlConditionalBranches):Total;16(8,84筹EalIrwlnjEions:工th。但口Etaken:15||92.7Sfeofoilcond.Qronehics=)nott-aJ=:.en:118.26%olallcond.Dranches-]Lo*

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论