计算机组织与结构:Chapter 13 Reduced Instruction Set Computers_第1页
计算机组织与结构:Chapter 13 Reduced Instruction Set Computers_第2页
计算机组织与结构:Chapter 13 Reduced Instruction Set Computers_第3页
计算机组织与结构:Chapter 13 Reduced Instruction Set Computers_第4页
计算机组织与结构:Chapter 13 Reduced Instruction Set Computers_第5页
已阅读5页,还剩49页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1/49WilliamStallings

ComputerOrganization

andArchitecture

8thEditionChapter13ReducedInstructionSetComputers2/40Chapter13

ReducedInstructionSetComputersKeytermsKeypointsChaptertitles3/40Chapter13

ReducedInstructionSetComputersKeytermsCISCcomplexinstructionsetcomputerRISCreducedinstructionsetcomputerDelayedbranchDelayedloadHLLhigh-levellanguageRegisterfileRegisterwindowSPARC4/40MajorAdvancesinComputers(1)ThefamilyconceptIBMSystem/3601964DECPDP-8SeparatesarchitecturefromimplementationMicroporgrammedcontrolunitIdeabyWilkes1951ProducedbyIBMS/3601964CachememoryIBMS/360model8519695/40MajorAdvancesinComputers(2)SolidStateRAM(Seememorynotes)MicroprocessorsIntel40041971PipeliningIntroducesparallelismintofetchexecutecycleMultipleprocessors6/40TheNextStep-RISCReducedInstructionSetComputerKeyfeaturesLargenumberofgeneralpurposeregistersoruseofcompilertechnologytooptimizeregisteruseLimitedandsimpleinstructionsetEmphasisonoptimisingtheinstructionpipeline7/40Comparisonofprocessors8/40DrivingforceforCISCSoftwarecostsfarexceedhardwarecostsIncreasinglycomplexhighlevellanguagesSemantic

gapLeadsto:LargeinstructionsetsMoreaddressingmodesHardwareimplementationsofHLLstatementse.g.CASE(switch)onVAXSemantic:语义的;语义学的

Gap英音:[gæp]豁口,裂口

9/40IntentionofCISCEasecompilerwritingImproveexecutionefficiencyComplexoperationsinmicrocodeSupportmorecomplexHLLs

ease减轻

HLL

缩写词abbr.

high-levellanguage【电脑】高级语言10/40ExecutionCharacteristicsOperationsperformedOperandsusedExecutionsequencingStudieshavebeendonebasedonprogramswritteninHLLsDynamicstudiesaremeasuredduringtheexecutionoftheprogram11/40OperationsAssignmentsMovementofdataConditionalstatements(IF,LOOP)SequencecontrolProcedurecall-returnisverytimeconsumingSomeHLLinstructionleadtomanymachinecodeoperations

Assignment

分配;指派,选派

12/40WeightedRelativeDynamicFrequencyofHLLOperations[PATT82a]

DynamicOccurrenceMachine-InstructionWeightedMemory-ReferenceWeighted

PascalCPascalCPascalCASSIGN45%38%13%13%14%15%LOOP5%3%42%32%33%26%CALL15%12%31%33%44%45%IF29%43%11%21%7%13%GOTO—3%————OTHER6%1%3%1%2%1%13/40OperandsMainlylocalscalarvariablesOptimisationshouldconcentrateonaccessinglocalvariables

PascalCAverageIntegerConstant16%23%20%ScalarVariable58%53%55%Array/Structure26%24%25%14/40ProcedureCallsVerytimeconsumingDependsonnumberofparameterspassedDependsonlevelofnestingMostprogramsdonotdoalotofcallsfollowedbylotsofreturnsMostvariablesarelocal(c.f.localityofreference)15/40ImplicationsBestsupportisgivenbyoptimisingmostusedandmosttimeconsumingfeaturesLargenumberofregistersOperandreferencingCarefuldesignofpipelinesBranchpredictionetc.Simplified(reduced)instructionset16/40LargeRegisterFileSoftwaresolutionRequirecompilertoallocateregistersAllocatebasedonmostusedvariablesinagiventimeRequiressophisticatedprogramanalysisHardwaresolutionHavemoreregistersThusmorevariableswillbeinregisters17/40RegistersforLocalVariablesStorelocalscalarvariablesinregistersReducesmemoryaccessEveryprocedure(function)callchangeslocalityParametersmustbepassedResultsmustbereturnedVariablesfromcallingprogramsmustberestored18/40RegisterWindowsOnlyfewparametersLimitedrangeofdepthofcallUsemultiplesmallsetsofregistersCallsswitchtoadifferentsetofregistersReturnsswitchbacktoapreviouslyusedsetofregisters19/40RegisterWindowscont.ThreeareaswithinaregistersetParameterregistersLocalregistersTemporaryregistersTemporaryregistersfromonesetoverlapparameterregistersfromthenextThisallowsparameterpassingwithoutmovingdata

cont.

1.内容,所含之物(contents)2.继续的;不断的;连续的20/40OverlappingRegisterWindows……21/40CircularBufferdiagram主程序1子程序A2子程序B3子程序C4子程序D5子程序E6子程序F7子程序G22/40OperationofCircularBufferWhenacallismade,acurrentwindowpointerismovedtoshowthecurrentlyactiveregisterwindowIfallwindowsareinuse,aninterruptisgeneratedandtheoldestwindow(theonefurthestbackinthecallnesting)issavedtomemoryAsavedwindowpointerindicateswherethenextsavedwindowsshouldrestoreto23/40GlobalVariablesAllocatedbythecompilertomemoryInefficientforfrequentlyaccessedvariablesHaveasetofregistersforglobalvariables24/40RegistersvCacheLargeRegisterFileCacheAlllocalscalarsRecently-usedlocalscalarsIndividualvariablesBlocksofmemoryCompiler-assignedglobalvariablesRecently-usedglobalvariablesSave/RestorebasedonprocedurenestingdepthSave/RestorebasedoncachereplacementalgorithmRegisteraddressingMemoryaddressing25/40ReferencingaScalar-

WindowBasedRegisterFile26/40ReferencingaScalar-Cache27/40ReferencingaScalar-

WindowBasedRegisterFile28/40CompilerBasedRegisterOptimizationAssumesmallnumberofregisters(16-32)OptimizinguseisuptocompilerHLLprogramshavenoexplicitreferencestoregistersusually-thinkaboutC-registerintAssignsymbolicorvirtualregistertoeachcandidatevariableMap(unlimited)symbolicregisterstorealregistersSymbolicregistersthatdonotoverlapcansharerealregistersIfyourunoutofrealregisterssomevariablesusememory29/40GraphColoringGivenagraphofnodesandedgesAssignacolortoeachnodeAdjacentnodeshavedifferentcolorsUseminimumnumberofcolorsNodesaresymbolicregistersTworegistersthatareliveinthesameprogramfragmentarejoinedbyanedgeTrytocolorthegraphwithncolors,wherenisthenumberofrealregistersNodesthatcannotbecoloredareplacedinmemory30/40GraphColoringApproach31/40WhyCISC(1)?Compilersimplification?

Disputed…ComplexmachineinstructionshardertoexploitOptimizationmoredifficult

(开发)Smallerprograms?Programtakesuplessmemorybut…MemoryisnowcheapMaynotoccupylessbits,justlookshorterinsymbolicformMoreinstructionsrequirelongerop-codesRegisterreferencesrequirefewerbits

dispute

英音:[di‘spju:t]争论;争执

simplification

1.单纯化2.简单化32/40WhyCISC(2)?Fasterprograms?BiastowardsuseofsimplerinstructionsMorecomplexcontrolunitMicroprogramcontrolstorelargerthussimpleinstructionstakelongertoexecuteItisfarfromclearthatCISCistheappropriatesolution

Bias倾向,趋势33/40RISCCharacteristicsOneinstructionpercycleRegistertoregisteroperationsFew,simpleaddressingmodesFew,simpleinstructionformatsHardwireddesign(nomicrocode)FixedinstructionformatMorecompiletime/effort34/40RISCvCISCNotclearcutManydesignsborrowfrombothphilosophiese.g.PowerPCandPentiumII

philosophies

哲学;观点35/40RISCPipeliningMostinstructionsareregistertoregisterTwophasesofexecutionI:InstructionfetchE:ExecuteALUoperationwithregisterinputandoutputForloadandstoreI:InstructionfetchE:ExecuteCalculatememoryaddressD:MemoryRegistertomemoryormemorytoregisteroperation36/40Sequentialexecutionfigure13.6adepictsthetimingofasequenceofinstructionsnopipelining.Clearly,thisisawastefulprocess.I37/40Figure13.6bshowsatwo-stagepipeliningscheme,inwhichtheIandEstagesoftwodifferentinstructionsareperformedsimultaneously.I38/40Figure13.6c,threeinstructionscanbeoverlapped,andtheimprovementisasmuchasafactorof3.I39/40Figure13.6dE1:RegisterfilereadE2:ALUoperationandregisterwriteI40/40EffectsofPipelining41/40OptimizationofPipeliningDelayedbranchDelayedLoadLoopUnrolling42/40OptimizationofPipelining

(1)LoopUnrollingReplicatebodyofloopanumberoftimesIterateloopfewertimesReducesloopoverheadIncreasesinstructionparallelismImprovedregister,datacacheorTLBlocality

unroll

展开,打开(卷着的东西)

replicate

英音:[‘replikeit]折叠;复制

iterate

英音:['itəreit]反复,重复

overhead英音:[‘əuvə’hed]日常开支,额外开销43/40OptimizationofPipelining

(2)DelayedbranchDoesnottakeeffectuntilafterexecutionoffollowinginstructionThisfollowinginstructionisthedelayslot44/40NormalandDelayedBranchAddressNormalBranchDelayedBranchOptimizedDelayedBranch100LOAD X,rALOAD X,rALOADX,rA101ADD 1,rAADD 1,rAJUMP 105102JUMP 105JUMP 106ADD 1,rA103ADD rA,rBNOOP

ADD rA,rB104SUB rC,rBADD rA,rBSUB rC,rB105STORErA,Z

SUB rC,rBSTORErA,Z

106

STORErA,Z

45/40UseofDelayed

BranchAddressNormalBranchDelayedBranchOptimizedDelayedBranch100LOAD X,rALOADX,rALOADX,rA101ADD1,rAADD1,rAJUMP105102JUMP105JUMP106ADD1,rA103ADDrA,rBNOOP ADDrA,rB104SUBrC,rBADDrA,rBSUBrC,rB105STORErA,Z SUBrC,rBSTORErA,Z 106

STORErA,Z

46/40OptimizationofPipelining

(3)DelayedLoadRegistertobetargetislockedbyprocessorContinueexecutionofinstructionstreamuntilregisterrequiredIdleuntilloadcompleteRe-arranginginstructionscanallowusefulworkwhilstloading47/4080486InstructionPipelineExamples48/40Controversy(1)QuantitativecompareprogramsizesandexecutionspeedsQualitativeexamineissuesofhighlevellanguagesupportanduseofVLSIrealestate

controversy

英音:[‘kɔntrəvə:si]争论,辩论;争议

quantitative

定量的

qualitative

定性的

estate

英音:[is'teit]财产,资产49/40Controversy(2)ProblemsNopairofRISCandCISCthataredirectlycomparableNodefinitivesetoftestprogramsDifficulttoseparatehardwareeffectsfromcompliereffectsMostcomparisonsdoneon“toy”ratherthanproductionmachinesMostcommercialdevicesareamixture50/40RequiredReadingStallingschapter13Manufacturerwebsites51/40请问现在的MIPS处理器设计中,延迟槽和分支预测是怎样的关系呢?

延迟槽和分支预测是提高流水线利用率的完全不相干的两种技术?延迟槽是早期用来提高性能的技术,现在已经很少用了呢?52/401.概述

分支延迟槽(Branchdelayslot),简单地说就是位

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论