第1章计算机体系结构基本原理课件_第1页
第1章计算机体系结构基本原理课件_第2页
第1章计算机体系结构基本原理课件_第3页
第1章计算机体系结构基本原理课件_第4页
第1章计算机体系结构基本原理课件_第5页
已阅读5页,还剩71页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

ComputerArchitecture

计算机系统结构UndergraduateCourse

WeiminWu(吴为民)SchoolofComputerandInformationTechnology,BeijingJiaotongUniveristySpring2014内容1.FundamentalsofComputerArchitecture计算机系统结构的基本原理2.InstructionSet指令集3.Pipeling流水线4.MemoryHierarchy存储层次5.Input-OutputSubsystem输入输出子系统6.InterconnectionNetworks7.ParallelComputers

本课的一般情况1.

共48学时(24次课)。其中课堂讲授32学时(16次课),实验16学时(8次课)。2.平时包括考勤、课堂作业和上机作业。3.最终有期末考试,开卷。英文试卷。4.考核方式:平时40%,期末60%。5.要求尽量读懂英文原文。读不懂的地方可参见本书的翻译版或者张晨曦的计算机系统结构教材。也可发Email给我:wmwu@着重注意:作业实验报告中务必写上你的课程班号(01或03),学号,姓名。1.FundamentalsofComputerArchitecture1.1LayersofComputerSystem

计算机系统的层次

1.2ComputerArchitectureandImplementation

计算机的系统结构和实现1.3TheTaskofAComputerDesigner

计算机设计者的任务1.4MeasuringandReportingPerformance

测量和报告性能1.5QuantitativePrinciplesofComputerDesign

计算机设计的量化原理1.6ClassificationofComputerArchitecture

计算机系统结构的分类计算机系统结构的基本原理1.1LayersofComputersystemsApplicationLanguageMachineM5应用语言机High-LevelLanguageMachineM4高级语言机AssemblyLanguageMachineM3汇编语言机OperatingSystemMachineM2操作系统机ConventionalMachineM1传统机MicroprogramMachineM0微程序机每个层次执行相关的功能子集。每个层次要依赖于下一个低层去执行更原始的功能。这就将问题分解成更易处理的子问题。从M2到M5的层次是虚拟机。在传统机上的指令(算数、逻辑等)由微程序级的程序实现。该程序是作为一个解释器,能理解一组简单的操作集合,称为微指令集。计算机系统的层次1.2ComputerArchitectureandImplementationComputerArchitecture

计算机系统结构Referstothoseattributesofasystemvisibletoaprogrammer,

orthoseattributeshavedirectimpactonlogicalexecutionofprogram.

程序员可见,或者对程序执行有直接影响的属性Implementation实现Twocomponents:Organizationandhardware.*Organization(组织):includeshigh-levelaspectsofacomputer’sdesign,

suchas:memorysystem,busstructure,internalCPU.*Hardware(硬件):referstothespecificsofamachine,include:detailedlogicdesignandpackagingtechnology.计算机系统结构和实现ArchitecturalAttributes系统结构方面的属性instructionset,指令集I/Omechanisms,I/O机制techniquesforaddressingmemory寻址技术

numberofbitsrepresentingvariousdatatype(numbers,characters)表示各种数据类型的位数(数值、字符)1.2ComputerArchitectureandImplementation,cont’dHardwareAttributes硬件方面的属性packagingtechnology封装技术power功耗cooling冷却

OrganizationalAttributes组织方面的属性Hardwaredetailstransparenttotheprogrammer.

对于程序员透明的硬件细节suchas:controlsignals控制信号computer/peripheralinterfaces计算机/外设接口

memorytechnology存储技术1.2ComputerArchitectureandImplementation,cont’dArchitecturalDesignIssue系统结构设计问题Whetheracomputerwillhaveamultiplyinstruction.是否要有一个乘法指令OrganizationalIssue组织设计问题Whethertheinstructionwillbeimplementedbyaspecialmultiplyunitorbyrepeateduseoftheaddunit.是采用乘法单元还是采用加法单元迭代使用Thedecisionmaybebasedontheanticipatedfrequencyofuseofthemultiplyinstruction,therelativespeedofthetwoapproaches,andthecostandphysicalsizeofaspecialmultiplyunit.决策取决于乘法指令使用频率,两种方法的相对速度,乘法单元的成本和大小1.2ComputerArchitectureandImplementation,cont’d1.3TheTaskofAComputerDesignerIsacomplexone:是一个复杂的问题

*Determinewhatattributesareimportantforanewmachine.确定哪些属性是重要的*Designamachinetomaximizeperformance(性能)

whilestayingwithincost(成本)

andpower(功耗)

constraints,including:instructionsetdesign指令集设计

functionalorganization功能设计

logicdesign逻辑设计

implementation(实现):ICdesign,package,cooling计算机设计者的任务功能要求需要或支持的典型特征补充知识集成电路产业发展的里程碑:1947:Bell实验室的Bardeen、Brattain、Schockly发明了晶体管。共获1956年诺贝尔物理学奖。

晶体管是IC产业的基石。1952:SONY开发出第一个基于晶体管的收音机。集成电路产业发展的里程碑(续):1958:TI的Kilby发明了第一块集成电路(IC)。获2000年诺贝尔物理学奖。Noyce将其完善实用化。集成电路产业发展的里程碑(续):1965:Moore对IC发展作出预言:Moore定律GordonMooreIntelCo-FounderandChairmainEmeritusImagesource:IntelCorporation

历史证明一直正确。但是,会继续持续下去吗?物理限制经济限制晶体管密度每18-24个月翻一番。性能每18-24个月翻一番。举个例子:光刻过程:因此:产生光刻畸变,需要矫正(OPC)集成电路产业发展的里程碑(续):1968:Noyce和Moore建立了Intel。1970:Intel开发出1KDRAM。1971:Intel研发出4位的4004微处理器(2250个晶体管)。集成电路产业发展的里程碑(续):1976/81:APPLEII/IBMPC。1984:Xilinx发明了FPGA。1985:Intel开始集中研发微处理器产品。集成电路产业发展的里程碑(续):1987:TSMC建立.全球最大的专业芯片制造服务公司。1991:ARM开发出其第一个可嵌入的RISCIP核(无芯片IC设计)。集成电路产业发展的里程碑(续):1996:三星开发出1GDRAM。1998:IBM研发出1GHz实验型微处理器。集成电路产业发展的里程碑(续):1999/较早:系统芯片(System-on-Chip,SOC)应用。2002/较早:系统封装(System-in-Package,SiP)工艺。1.4MeasuringandReportingPerformance快的涵义?*Theusermaysayacomputerisfasterwhenaprogramrunsinlesstime.用户:程序运行时间短*thecomputercentermanagermaysayacomputerisfasterwhenitcompletesmorejobsinanhour.计算机中心经理:在一小时内做更多工作*Thecomputeruserisinterestedinreducingresponsetime(响应时间)—thetimebetweenthestartandthecompletionofanevent—alsoreferredtoasexecutiontime(执行时间).*Themanagerofadataprocessingcentermaybeinterestedinincreasingthroughput(吞吐量)—thetotalamountofworkdoneinagiventime.测量和报告计算机的性能Comparingdesignalternatives:*“XisfasterthanY”meanthattheresponsetimeisloweronXthanonY.X比Y快涵义*“XisntimesfasterthanY”mean:X比Y快n倍*Sinceexecutiontimeisthereciprocalofperformance:执行时间是性能的倒数1.4MeasuringandReportingPerformance,cont’dEvenexecutiontimecanbedefinedindifferentways:执行时间的不同定义*wall-clocktime,responsetime,orelapsedtime,whichisthelatencytocompleteatask,includingdiskaccesses,memoryaccesses,input/output

activities,operatingsystemoverhead.最直接的定义

*WithmultiprogrammingtheCPUworksonanotherprogramwhilewaitingforI/Oandmaynotnecessarilyminimizetheelapsedtimeofoneprogram.Henceweneedatermtotakethisactivityintoaccount.但多道程序的情况要考虑MeasuringPerformance测量性能1.4MeasuringandReportingPerformance,cont’dEvenexecutiontimecanbedefinedindifferentways:执行时间的不同定义*CPUtime(CPU时间):meansthetimeCPUiscomputing,notincludingthetimewaitingforI/Oorrunningotherprograms.*CPUtimecanbefurtherdividedinto:进一步分为

theCPUtimespentintheprogram,calleduserCPUtime(用户CPU时间),theCPUtimespentintheoperatingsystemperformingtasksrequestedbytheprogram,calledsystemCPUtime(系统CPU时间).MeasuringPerformance测量性能1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

选择程序来评估性能1.4MeasuringandReportingPerformance,cont’dfourlevelsofprogramslistedbelowindecreasingorderofaccuracyofprediction.四个层次的程序,按预测精确度从高到底的次序1.Realapplications

真实应用*ExamplesarecompilersforC,text-processingsoftwarelikeWord,andotherapplicationslikePhotoshop.*Realapplicationshaveinput,output,andoptionsthatausercanselectwhenrunningtheprogram.有输入、输出、可选项1.4MeasuringandReportingPerformance,cont’d

2.Kernels

核心程序*extractsmall,keypiecesfromrealprogramsandusethemtoevaluateperformance.关键片段*Unlikerealprograms,nouserwouldrunkernelprograms,fortheyexistsolelytoevaluateperformance.无实际用途,只用于评估性能*Kernelsarebestusedtoisolateperformanceofindividualfeaturesofamachinetoexplainthereasonsfordifferencesinperformanceofrealprograms.最便于辨析出机器单个特性的性能ChoosingProgramstoEvaluatePerformance

选择程序来评估性能3.Toybenchmarks

玩具测试基准*typicallybetween10and100linesofcodeandproducearesulttheuseralreadyknows.

10-100行的代码,运行结果已知。*ProgramslikePuzzle,andQuicksortarepopularbecausetheyaresmall,easytotype,andrunonalmostanycomputer.

小,易于键入,可运行于几乎所有计算机。1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

选择程序来评估性能4.Syntheticbenchmarks

合成测试基准*Similarinphilosophytokernels,syntheticbenchmarkstrytomatchtheaveragefrequencyofoperationsandoperandsofalargesetofprograms.匹配程序中操作和操作数的平均频率*Nouserrunssyntheticbenchmarks,becausetheydon’tcomputeanythingausercouldwant.1.4MeasuringandReportingPerformance,cont’dChoosingProgramstoEvaluatePerformance

选择程序来评估性能puttogethercollectionsofbenchmarkstomeasuretheperformanceofprocessorswithavarietyofapplications.是一个有各种应用的组合Akeyadvantageofsuchsuitesisthattheweaknessofonebenchmarkislessenedbythepresenceofotherbenchmarks.优劣互补Benchmarksuitsaremadeofcollectionsofprograms,someofwhichmaybekernels,butmanyofwhicharetypicallyrealprograms.有些是核心程序,但很多是真实程序BenchmarkSuites测试基准程序1.4MeasuringandReportingPerformance,cont’dTheguidingprincipleofreportingperformancemeasurementsshouldbereproducibility

(可重现性).requiresafairlycompletedescriptionofthemachine,thecompilerflags,aswellasthepublicationofboththebaselineandoptimizedresults.要求完整的描述containstheactualperformancetimes,shownbothintabularformandasagraph.

包含实际性能,并用表或图的形式表示ReportingPerformanceResults报告性能结果1.4MeasuringandReportingPerformance,cont’dComparingandSummarizingPerformance

比较和总结性能1.4MeasuringandReportingPerformance,cont’dbattlesarefoughtoverwhatisthefairwaytosummarizerelativeperformanceofacollectionofprograms.什么是公平的方法Forexample,twoarticlesonsummarizingperformanceinthesamejournaltookopposingpointsofview.观点不同Figure1.5,takenfromonearticle,isanexampleoftheconfusionthatcanarise.thefollowingstatementshold:*Ais10timesfasterthanBforprogramP1.A比B快10倍*Bis10timesfasterthanAforprogramP2.B比A快10倍*Ais20timesfasterthanCforprogramP1.A比C快20倍*Cis50timesfasterthanAforprogramP2.C比A快50倍*Bis2timesfasterthanCforprogramP1.B比C快2倍*Cis5timesfasterthanBforprogramP2.C比B快5倍TherelativeperformanceofA,B,andCisunclear.结论不明1.4MeasuringandReportingPerformance,cont’dusetotalexecutiontimeofP1andP2.*Bis9.1timesfasterthanA.*Cis25timesfasterthanA.*Cis2.75timesfasterthanB.Thissummarytracksexecutiontime,ourfinalmeasureofperformance.执行时间,最终性能度量IftheworkloadconsistedofrunningprogramsP1andP2anequalnumberoftimes,thestatementsabovewouldpredicttherelativeexecutiontimes.如果P1和P2的执行次数相等,okTotalExecutionTime:AConsistentSummaryMeasure总体执行时间1.4MeasuringandReportingPerformance,cont’dAnaverageoftheexecutiontimeisthearithmeticmean:平均执行时间whereTimeiistheexecutiontimefortheithprogram.1.4MeasuringandReportingPerformance,cont’dAreprogramsP1andP2infactrunequallyintheworkload?P1和P2同等吗?程序出现频率不同时的执行时间计算方法。Ifnot,thenoneapproachistoassignaweightingfactor

wi

toeachprogramtoindicatetherelativefrequencyoftheprograminworkload.

第一种方法:对每个程序赋予权值,指明其出现的相对频率WeightedExecutionTime加权执行时间1.4MeasuringandReportingPerformance,cont’dThisiscalledtheweightedarithmeticmean:加权算数平均值whereWeighti

isthefrequencyoftheithprogramintheworkloadandTimei

istheexecutiontimeofthatprogram.1.4MeasuringandReportingPerformance,cont’dFigure1.6showsthedatafromFigure1.5withthreedifferentweightings,eachproportionaltotheexecutiontimeofaworkloadwithagivenmix.权值设定:与执行时间成比例1.4MeasuringandReportingPerformance,cont’dABCAsecondapproachtounequalmixtureofprogramsistonormalizeexecutiontimestoareferencemachine(参考机)

andtaketheaverageofthenormalizedexecutiontimes.第二种方法:归一化执行时间,再取平均值performanceofnewprogramscanbepredictedbysimplymultiplyingthisnumbertimesitsperformanceonthereferencemachine.实际性能=归一化数×参考机性能NormalizedExecutionTimeandtheProsandConsofGeometricMeans归一化执行时间,以及几何平均值的优劣1.4MeasuringandReportingPerformance,cont’dAveragenormalizedexecutiontimecanbeexpressedaseitheranarithmeticorgeometricmean.可采用算数或几何平均值Theformulaforthegeometricmeanis

几何平均值的表达式whereExecutiontimeratioi

istheexecutiontime,normalizedtothereferencemachine,fortheithprogramofatotalofnintheworkload.1.4MeasuringandReportingPerformance,cont’dGeometricmeanshaveanicepropertyfortwosamplesXi

andYi:几何平均值的好性质几何平均值的比率与比率的几何平均值相同1.4MeasuringandReportingPerformance,cont’dIncontrasttoarithmeticmeans,geometricmeansofnormalizedexecutiontimesareconsistentnomatterwhichmachineisthereference.Hence,thearithmeticmeanshouldnotbeusedto.无论采用哪个机器作为参考机,归一化执行时间的几何平均值都是一致的。故不应采用算数平均值。Figure1.7showssomevariationsusingbotharithmeticandgeometricmeans.ExecutiontimesfromFigure1.5normalizedtoeachmachine1.4MeasuringandReportingPerformance,cont’dThearithmeticmeanperformancevariesdependingonwhichisthereferencemachine*incolumn2,B’sexecutiontimeisfivetimeslongerthanA’s,althoughthereverseistrueincolumn4.*Incolumn3,Cisslowest,butincolumn9,Cisfastest.1.4MeasuringandReportingPerformance,cont’dThegeometricmeansareindependentofnormalization*AandBhavethesameperformance,andtheexecutiontimeofCis0.63ofAorB(1/1.58is0.63).*Unfortunately,thetotalexecutiontimeofAis10timeslongerthanthatofB,andBinturnisabout3timeslongerthanC.*Asapointofinterest,therelationshipbetweenthemeansofthesamesetofnumbersisalways:geometricmean≤arithmeticmeanadvantage:geometricmeanisindependentoftherunningtimesofindividualprograms,anditdoesn’tmatterwhichmachineisusedtonormalize.与各个程序运行时间无关,与采用哪一个机器进行归一化无关drawback:geometricmeansviolateourfundamentalprincipleofperformancemeasurement---donotpredictexecutiontime.违反了性能测量的基本原理,不预测时间1.4MeasuringandReportingPerformance,cont’dNormalizedExecutionTimeandtheProsandConsofGeometricMeans归一化执行时间,以及几何平均值的优劣MakeCommonCaseFast使常见情况更快Perhapsthemostimportantandpervasiveprincipleofcomputerdesignistomakethecommoncasefast.Inmakingadesigntradeoff,favorfrequentcaseoverinfrequentcase.照顾经常发生的情况Thisprinciplealsoapplieswhendetermininghowtospendresources.

对资源使用也是这个道理1.5QuantitativePrinciplesofComputerDesign计算机设计的量化原理1.5QuantitativePrinciplesofComputerDesignAmdahl’sLaw阿姆达尔定律TheperformancegainobtainedbyimprovingsomeportionofacomputercanbecalculatedusingAmdahl’sLaw.用途Amdahl’sLawstatesthattheperformanceimprovementtobegainedfromusingsomefastermodeofexecutionislimitedbythefractionofthetimethefastermodecanbeused.阿姆达尔定律的涵义:由某些部分加速所得到的性能提高受加速部分的百分率所限。1.5QuantitativePrinciplesofComputerDesign或者Amdahl’sLawdefinesthespeedup

thatcanbegainedbyusingaparticularfeature.Speedupistheratio加速比的定义Amdahl’sLawgivesusaquickwaytofindthespeedupfromsomeenhancement,Speedupoverall,whichdependsontwofactors:加速比取决于两个因素1.Thefractionofthecomputationtimeintheoriginalmachinethatcanbeconvertedtotakeadvantageoftheenhancement.

能加速的部分Fractionenhanced12.Theimprovementgainedbytheenhancedexecutionmode.

能加速的程度Speedupenhanced11.5QuantitativePrinciplesofComputerDesign新的执行时间Theoverallspeedupistheratiooftheexecutiontimes:总体加速比1.5QuantitativePrinciplesofComputerDesignEXAMPLE:Supposethatweareconsideringanenhancementthatruns10timesfasterthantheoriginalmachine,butisonlyusable40%ofthetime.Whatistheoverallspeedupgainedbyincorporatingtheenhancement?例子1.5QuantitativePrinciplesofComputerDesignAmdahl’sLawexpressesthelawofdiminishingreturns(回报递减法则):Theincrementalimprovementinspeedupgainedbyanadditionalimprovementinjustaportionofthecomputationdiminishesasimprovementsareadded.对于一部分性能的提高,总体加速比的提高呈递减AnimportantcorollaryofAmdahl’sLawisthatifanenhancementisonlyusableforafractionofatask,wecan’tspeedupthetaskbymorethanthereciprocalof1minusthatfraction.总体加速比有上界1.5QuantitativePrinciplesofComputerDesignEXAMPLE:Implementationsoffloating-pointsquareroot(FPSQR)

varysignificantlyinperformance.SupposeFPSQRisresponsiblefor20%oftheexecutiontimeofacriticalbenchmark.OneproposalistoaddFPSQRhardwarethatwillspeedupthisoperationbyafactorof10.TheotheralternativeisjusttotrytomakeallFPinstructionsrunfaster;FPinstructionsareresponsibleforatotalof50%oftheexecutiontime.ThedesignteambelievesthattheycanmakeallFPinstructionsruntwotimesfasterwiththesameeffortasrequiredforthefastsquareroot.Comparethesetwodesignalternatives.ANSWER:comparingthespeedups:2.00.751.33ImprovingtheperformanceoftheFPoperationsoverallisslightlybetterbecauseofthehigherfrequency.1.5QuantitativePrinciplesofComputerDesignTheCPUPerformanceEquationCPU性能方程Essentiallyallcomputersareconstructedusingaclockrunningataconstantrate.Thesediscretetimeeventsarecalledticks,clockticks,clockperiods,clocks,cycles,orclockcycles.时钟Computerdesignersrefertothetimeofaclockperiodbyitsduration(e.g.,1ns)orbyitsrate(e.g.,1GHz).CPUtimeforaprogramcanthenbeexpressedintwoways:程序的CPU时间1.5QuantitativePrinciplesofComputerDesignwecanalsocountthenumberofinstructionsexecuted---theinstructionpathlength

orinstructioncount

(IC).指令数

Ifweknowthenumberofclockcyclesandtheinstructioncountwecancalculatetheaveragenumberofclockcyclesperinstruction(CPI).

每条指令的平均时钟数1.5QuantitativePrinciplesofComputerDesignThisallowsustouseCPIintheexecutiontimeformula:执行时间的公式Expandingthefirstformulaas:1.5QuantitativePrinciplesofComputerDesignorSo,CPUperformanceisdependentupon:clockcycle(orrate),CPI,andIC.Butitisdifficulttochangeoneparameterinisolationfromothersbecausethebasictechnologiesinvolvedareinterdependent:很难改变一个参数而不影响其它参数*Clockcycletime

--Hardwaretechnologyandorganization*CPI--OrganizationandISA*Instructioncount--ISAandcompilertechnologyLuckily,manyimprovementtechniquesprimarilyimproveonecomponentwithsmallorpredictableimpactsontheothertwo.幸好,很多技术在改进一个部分时,对于其他部分影响很小或影响可预测1.5QuantitativePrinciplesofComputerDesignSometimesitisusefulindesigningtheCPUtouse:另一种计算公式whereICi

representsnumberoftimesinstructioniisexecutedinaprogramandCPIi

representstheaveragenumberofclockcyclesforinstructioni.ThisformcanbeusedtoexpressCPUtimeas:1.5QuantitativePrinciplesofComputerDesignandCPIas:EXAMPLE:

例子Supposewehavethefollowingmeasurements:*FrequencyofFPoperations=25%*AverageCPIofFPoperations=4.0*AverageCPIofotherinstructions=1.33*FrequencyofFPSQR=2%*CPIofFPSQR=20

测量结果Assumethatthetwodesignalternativesareto

reducetheCPIofFPSQRto2ortoreducetheaverageCPIofallFPoperationsto2.ComparethesetwodesignalternativesusingtheCPUperformanceequation.设计选择1.5QuantitativePrinciplesofComputerDesignANSWER:答案First,observethatonlytheCPIchanges;theclockrateandinstructioncountremainidentical.只有CPI变化了WecancomputetheCPIfortheenhancedFPSQRby:增强FPSQR的CPI1.5QuantitativePrinciplesofComputerDesignWecomputetheCPIfortheenhancementofallFPinstructions:增强FP指令的CPITheCPIofoverallFPenhancementislower,itsperformancewillbetter.改进FP的CPI更好Specifically,thespeedupfortheoverallFPenhancementis:2.01.5

1.5

1.33

1.5QuantitativePrinciplesofComputerDesignMeasuringtheComponentsofCPUPerformance

测量CPU性能的各组成部分TousetheCPUperformanceequation,weneedmeasurementsoftheindividualcomponents.需要测量性能非常的各组成部分Todeterminetheclockcycle:时钟周期*iseasyforanexistingCPU.现有CPU:容易*Low-leveltools,calledtimingestimatorsortimingverifiers,areusedforacompleteddesign.

已完成的设计:用时延估计器或时延验证器*Foradesignthatisnotcompleted,byexaminingcriticalpaths.未完成的设计:考察关键路径1.5QuantitativePrinciplesofComputerDesignMeasuringtheinstructioncount:

指令数测量*compilertogetherwithtoolsthatmeasuretheinstructionsetbehavior.编译器及测量指令集行为的工具*Foracompiledversionofaprogram,therearetwomajormethodstoobtainIC.如何获得ICfirstway:byinstructionsetsimulatorthatinterpretstheinstructions—slowbutcanmeasurealmostanyaspectofinstructionsetbehavioraccurately.指令集模拟器:慢,但能精确地测量指令集行为的几乎所有方面secondway:usesexecution-basedmonitoring.thebinaryprogramismodifiedtoincludeinstrumentationcode

—veryfast,sinceprogramisexecuted,ratherthaninterpreted用基于执行的监视:修改程序(插桩代码),快。1.5QuantitativePrinciplesofComputerDesignMeasuringtheCPI:difficult测量CPI困难*Forsimpleprocessors,CPIfromatable.查表*Formodernprocessorsusetechniquessuchaspipeliningandmemoryhierarchies:对于带流水线和存储层次的现代处理器DesignersoftenuseaverageCPIvalues,buttheseaverageCPIsarecomputedbymeasuringtheeffectsofthepipelineandcachestructure.通常使用平均CPI,需考虑流水线和cache结构itisoftenusefultoseparatethecomponentarisingfromthememorysystemandthecomponentdeterminedbythepipeline.流水线和存储系统分别考虑Thus,wecancomputetheCPIforinstructioni,as:

CPIi=PipelineCPIi+MemorysystemCPIi1.5QuantitativePrinciplesofComputerDesignUsingtheCPUPerformanceEquations:MoreExamples运用CPU性能方程:更多例子EXAMPLE:例子weareconsideringtwoalternativesforourconditionalbranchinstructions(条件转移指令),as:条件转移指令有两种设计选择

*CPUA:Aconditioncodeissetbyacompareinstructionandfollowedbyabranchthatteststheconditioncode.先用比较指令置条件码,然后转移指令检测条件码*CPUB:Acompareisincludedinthebranch.

转移指令中包含了比较操作1.5QuantitativePrinciplesofComputerDesignOnbothCPUs,conditionalbranchinstructiontakes2cycles,andallotherinstructionstake1clockcycle.条件转移指令2周期,其他指令1周期

OnCPUA,20%ofallinstructionsexecutedareconditionalbranches.Sinceeverybranchneedsacompare,another20%oftheinstructionsarecompares.CPUA:有20%条件转移指令,相应也就有20%的比较指令BecauseCPUAdoesnothavethecompareincludedinthebranch,assumethatitsclockcycletimeis1.25timesfasterthanthatofCPUB.

CPUA的时钟比CPUB的快1.25倍WhichCPUisfaster?哪一个更快?WhatifCPUAwasonly1.1timesfaster?

1.5QuantitativePrinciplesofComputerDesignANSWER:答案wecanuseCPUperformanceformula:

CPIA=0.202+0.801=1.2CPUtimeA=ICA1.2ClockcycletimeAClockcycletimeB=1.25ClockcycletimeAComparesarenotexecutedinCPUB,so20%/80%=25%instructionsarebranches:

CPIB=0.252+0.751=1.25Because,ICB=0.8ICA.so:

CPUtimeB=ICB1.25ClockcycletimeB

=0.8ICA1.25(1.25ClockcycletimeA)=1.25ICAClockcycletimeA

>CPUtimeA

所以此时A快1.5QuantitativePrinciplesofComputerDesignIfCPUAwereonly1.1timesfaster,thenClockcycletimesis1.10ClockcycletimeAandtheperformanceofCPUBis:如果CPUA只比CPUB快1.1倍

CPUtimeB=ICBCPIBClockcycletimeB

=0.8ICA1.25(1.10ClockcycletimeA)=1.10ICAClockcycletimeA<CPUtimeA

所以此时B快本质上是时钟周期和指令数量之间的权衡。1

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论