版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Chapter1:
FundamentalsofComputerDesignDavidPattersonElectricalEngineeringandComputerSciencesUniversityofCalifornia,Berkeley/~pattrsn/~cs252Originalslidescreatedby:Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls2WhatisComputerArchitecture?FunctionaloperationoftheindividualHWunitswithinacomputersystem,andtheflowofinformationandcontrolamongthem.TechnologyProgrammingLanguageInterfaceInterfaceDesign(ISA)Measurement&EvaluationParallelismComputerArchitecture:ApplicationsOSHardwareOrganization34AbstractionLayersinModernSystemsAlgorithmGates/Register-TransferLevel(RTL)ApplicationInstructionSetArchitecture(ISA)OperatingSystem/VirtualMachineMicroarchitectureDevicesProgrammingLanguageCircuitsPhysicsOriginaldomainofthecomputerarchitect(‘50s-’80s)Domainofrecentcomputerarchitecture(‘90s)Reliability,power,…Parallelcomputing,security,…Reinvigorationofcomputerarchitecture,mid-2000sonward.5ComputerSystems:TechnologyTrends1988SupercomputersMassivelyParallelProcessorsMini-supercomputersMinicomputersWorkstationsPC’s2002PowerfulPC’sandSMPWorkstationsNetworkofSMPWorkstationsMainframesSupercomputersEmbeddedComputersCrossroads:ConventionalWisdominComp.ArchOldConventionalWisdom:Powerisfree,TransistorsexpensiveNewConventionalWisdom:“Powerwall”Powerexpensive,Xtorsfree
(Canputmoreonchipthancanaffordtoturnon)OldCW:SufficientlyincreasingInstructionLevelParallelismviacompilers,innovation(Out-of-order,speculation,…)NewCW:“ILPwall”lawofdiminishingreturnsonmoreHWforILPOldCW:Multipliesareslow,MemoryaccessisfastNewCW:“Memorywall”Memoryslow,multipliesfast
(200clockcyclestoDRAMmemory,4clocksformultiply)OldCW:Uniprocessorperformance2X/1.5yrsNewCW:PowerWall+ILPWall+MemoryWall=BrickWallUniprocessorperformancenow2X/5(?)yrs Seachangeinchipdesign:multiple“cores”
(2Xprocessorsperchip/~2years)Moresimplerprocessorsaremorepowerefficient6Crossroads:UniprocessorPerformanceVAX :25%/year1978to1986RISC+x86:52%/year1986to2002RISC+x86:??%/year2002topresentFromHennessyandPatterson,ComputerArchitecture:AQuantitativeApproach,4thedition,October,2006Lessthan20%7ChangeinChipDesignIntel4004(1971):4-bitprocessor,
2312transistors,0.4MHz,
10micronPMOS,11mm2chip
Processoristhenewtransistor?
RISCII(1983):32-bit,5stage
pipeline,40,760transistors,3MHz,
3micronNMOS,60mm2chip125mm2chip,0.065micronCMOS
=2312RISCII+FPU+Icache+DcacheRISCIIshrinksto~0.02mm2at65nmCachesviaDRAMor1transistorSRAM()?ProximityCommunicationviacapacitivecouplingat>1TB/s?
(IvanSutherland@Sun/Berkeley)8TakingAdvantageofParallelismIncreasingthroughputofservercomputerviamultipleprocessorsormultipledisksDetailedHWdesignCarrylookaheadaddersusesparallelismtospeedupcomputingsumsfromlineartologarithmicinnumberofbitsperoperandMultiplememorybankssearchedinparallelinset-associativecachesPipelining:overlapinstructionexecutiontoreducethetotaltimetocompleteaninstructionsequence.Noteveryinstructiondependsonimmediatepredecessorexecutinginstructionscompletely/partiallyinparallelpossibleClassic5-stagepipeline:
1)InstructionFetch(Ifetch),
2)RegisterRead(Reg),
3)Execute(ALU),
4)DataMemoryAccess(Dmem),
5)RegisterWrite(Reg)9PipelinedInstructionExecutionInstr.OrderTime(clockcycles)RegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegCycle1Cycle2Cycle3Cycle4Cycle6Cycle7Cycle510Limitstopipelining
HazardspreventnextinstructionfromexecutingduringitsdesignatedclockcycleStructuralhazards:attempttousethesamehardwaretodotwodifferentthingsatonceDatahazards:InstructiondependsonresultofpriorinstructionstillinthepipelineControlhazards:Causedbydelaybetweenthefetchingofinstructionsanddecisionsaboutchangesincontrolflow(branchesandjumps).Instr.OrderTime(clockcycles)RegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchRegRegALUDMemIfetchReg11ThePrincipleofLocalityThePrincipleofLocality:Programaccessarelativelysmallportionoftheaddressspaceatanyinstantoftime.TwoDifferentTypesofLocality:TemporalLocality(LocalityinTime):Ifanitemisreferenced,itwilltendtobereferencedagainsoon(e.g.,loops,reuse)SpatialLocality(LocalityinSpace):Ifanitemisreferenced,itemswhoseaddressesareclosebytendtobereferencedsoon
(e.g.,straight-linecode,arrayaccess)Last30years,HWreliedonlocalityformemoryperf.PMEM$12LevelsoftheMemoryHierarchyCPURegisters100sBytes300–500ps(0.3-0.5ns)L1andL2Cache10s-100sKBytes~1ns-~10ns$1000s/GByteMainMemoryGBytes80ns-200ns~$100/GByteDisk10sTBytes,10ms
(10,000,000ns)~$1/GByteCapacityAccessTimeCostTapeinfinitesec-min~$1/GByteRegistersL1CacheMemoryDiskTapeInstr.OperandsBlocksPagesFilesStagingXferUnitprog./compiler1-8bytescachecntl32-64bytesOS4K-8Kbytesuser/operatorMbytesUpperLevelLowerLevelfasterLargerL2Cachecachecntl64-128bytesBlocks13WhatComputerArchitecturebringstoTableOtherfieldsoftenborrowideasfromarchitectureQuantitativePrinciplesofDesignTakeAdvantageofParallelismPrincipleofLocalityFocusontheCommonCaseAmdahl’sLawTheProcessorPerformanceEquationCareful,quantitativecomparisonsDefine,quantity,andsummarizerelativeperformanceDefineandquantityrelativecostDefineandquantitydependabilityDefineandquantitypowerCultureofanticipatingandexploitingadvancesintechnologyCultureofwell-definedinterfacesthatarecarefullyimplementedandthoroughlychecked14Comp.Arch.isanIntegratedApproachWhatreallymattersisthefunctioningofthecompletesystemhardware,runtimesystem,compiler,operatingsystem,andapplicationInnetworking,thisiscalledthe“EndtoEndargument”Computerarchitectureisnotjustabouttransistors,individualinstructions,orparticularimplementationsE.g.,OriginalRISCprojectsreplacedcomplexinstructionswithacompiler+simpleinstructions15ComputerArchitectureis
DesignandAnalysisArchitectureisaniterativeprocess:SearchingthespaceofpossibledesignsAtalllevelsofcomputersystemsCreativityGoodIdeasMediocreIdeasBadIdeasCost/PerformanceAnalysis16Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls17FocusontheCommonCaseCommonsenseguidescomputerdesignSinceitsengineering,commonsenseisvaluableInmakingadesigntrade-off,favorthefrequentcaseovertheinfrequentcaseE.g.,Instructionfetchanddecodeunitusedmorefrequentlythanmultiplier,sooptimizeit1stE.g.,Ifdatabaseserverhas50disks/processor,storagedependabilitydominatessystemdependability,sooptimizeit1stFrequentcaseisoftensimplerandcanbedonefasterthantheinfrequentcaseE.g.,overflowisrarewhenadding2numbers,soimproveperformancebyoptimizingmorecommoncaseofnooverflowMayslowdownoverflow,butoverallperformanceimprovedbyoptimizingforthenormalcaseWhatisfrequentcaseandhowmuchperformanceimprovedbymakingcasefaster=>Amdahl’sLaw
18Amdahl’sLawBestyoucouldeverhopetodo:19Amdahl’sLawexampleNewCPU10XfasterI/Oboundserver,so60%timewaitingforI/OApparently,itshumannaturetobeattractedby10Xfaster,vs.keepinginperspectiveitsjust1.6Xfaster20Processorperformanceequation InstCount CPI ClockRateProgram X Compiler X (X)Inst.Set. X XOrganization X XTechnology XCPUtime =Seconds=InstructionsxCyclesxSeconds Program ProgramInstructionCycleinstcountCPICycletime21RelatingMetricsCPUexecutiontimeMeasuredtimeforarunningprogramEasytobemeasuredClockcyclesThenumberoftheclockpulseforarunningprogramHardtobemeasuredInstructioncountThenumberofinstructionsexecutedbytheprogramcanbemeasuredbyusingsoftwaretoolsthatprofiletheexecutionorbyusingasimulatorofthearchitectureCPIClockcyclesperinstructionsNeedtheclockcyclesandcountinstructionnumberforeachinstructiontypeforcomputingtheCPIClocksDigitalcircuithasaclockthatrunsataconstantrate(像人的脈膊),clockisusedforsignalsynchronizationCycletime=timeforonefullcycle(secondspercycle)Clockrate=cyclespersecond(HertzorHz)AlsoknownasclockfrequencyScientificPrefixesusingwithcycletimeandclockratePrefixSymbolMultipleteraT10E12gigaG10E9megaM10E6kilok10E3millim10E-3micro
u10E-6nanon10E-9picop10E-12What’saClockCycle?Olddays:10levelsofgatesToday:determinedbynumeroustime-of-flightissues+gatedelaysclockpropagation,wirelengths,driversLatchorregistercombinationallogic24TheaveragenumberofclockcycleseachinstructiontakestoexecuteAfloatingpointintensiveapplicationmighthaveahigherCPICPUclockcycles=InstructioncountxCPICPUtime=CPUclockcyclesxClockcycletimeCPUtime=InstructioncountxCPIxClockcycletimeCPUtime=(InstructioncountxCPI)/ClockrateCPI(Clockcyclesperinstruction)Supposewehavetwoimplementationsofthesameinstructionset
architecture(ISA).
Forsomeprogram,
MachineAhasaclockcycletimeof10ns.andaCPIof4.0
MachineBhasaclockcycletimeof20ns.andaCPIof1.2
Whatmachineisfasterforthisprogram,andbyhowmuch?
CPIExampleCPIExampleAnswer:MachineA:clockcycle=1ns,CPI=2MachineB:clockcycle=2ns,CPI=1.2CPUclockcyclesA=InstructionCountx4.0CPUclockcyclesB=InstructionCountx1.2CPUtimeA=CPUclockcyclesAxclockcycletime=InstructionCountx2x1=2xInstructionCountCPUtimeB=InstructionCountx1.2x2=4.4xInstructionCountPerformanceA/PerformanceB=ExecutiontimeB/ExecutiontimeA=(4.4xI)/(2xI)=1.2Thus,Ais1.2timesfasterthanBOutline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls28Desktop:personalcomputerServer:webservers,fileservers,databaseserversEmbedded:handhelddevices(phones,cameras),dedicatedparallelcomputersThreemainclassesofcomputers29FeatureDesktopServerEmbeddedPriceofsystemPriceofmultiprocessormoduleCriticalsystemdesignissues$500-$5000$5000-$5,000,000$10-$100,000$50-$500$200-$10,000$.01-$100Price-performance,GraphicsperformanceThroughput,Availability,ScalabilityPrice,Powerconsumption,Application-specificperformance30Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls31InstructionSetArchitecture:CriticalInterfacePropertiesofagoodabstractionLaststhroughmanygenerations(portability)Usedinmanydifferentways(generality)ProvidesconvenientfunctionalitytohigherlevelsPermitsanefficientimplementationatlowerlevelsinstructionsetsoftwarehardware32Example:MIPSarchitecture0r0r1°°°r31PClohiProgrammablestorage 2^32xbytes 31x32-bitGPRs(R0=0) 32x32-bitFPregs(pairedDP) HI,LO,PCDatatypes?Format?AddressingModes? Arithmeticlogical
Add,AddU,Sub,SubU,And,Or,Xor,Nor,SLT,SLTU, AddI,AddIU,SLTI,SLTIU,AndI,OrI,XorI,LUI SLL,SRL,SRA,SLLV,SRLV,SRAVMemoryAccess
LB,LBU,LH,LHU,LW,LWL,LWR SB,SH,SW,SWL,SWRControl
J,JAL,JR,JALR BEq,BNE,BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL32-bitinstructionsonwordboundary33RegistertoregisterTransfer,branchesJumpsMIPSarchitectureinstructionsetformat34ISAvs.ComputerArchitectureOlddefinitionofcomputerarchitecture
=instructionsetdesignOtheraspectsofcomputerdesigncalledimplementationInsinuatesimplementationisuninterestingorlesschallengingOurviewiscomputerarchitecture>>ISAArchitect’sjobmuchmorethaninstructionsetdesign;technicalhurdlestodaymorechallengingthanthoseininstructionsetdesignSinceinstructionsetdesignnotwhereactionis,someconcludecomputerarchitecture(usingolddefinition)isnotwhereactionisWedisagreeonconclusionAgreethatISAnotwhereactionis(ISAinCA:AQA4/eappendix)35Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls36Moore’sLaw:2Xtransistors/“year”“CrammingMoreComponentsontoIntegratedCircuits”GordonMoore,Electronics,1965#ontransistors/cost-effectiveintegratedcircuitdoubleeveryNmonths(12≤N≤24)37TrackingTechnologyPerformanceTrendsDrilldowninto4technologies:Disks,Memory,Network,ProcessorsCompare~1980Archaic(Nostalgic)vs.
~2000Modern(Newfangled)PerformanceMilestonesineachtechnologyCompareforBandwidthvs.LatencyimprovementsinperformanceovertimeBandwidth:numberofeventsperunittimeE.g.,Mbits/secondovernetwork,Mbytes/secondfromdiskLatency:elapsedtimeforasingleeventE.g.,one-waynetworkdelayinmicroseconds,
averagediskaccesstimeinmilliseconds38Disks:Archaic(Nostalgic)v.Modern(Newfangled)CDCWrenI,19833600RPM0.03GBytescapacityTracks/Inch:800
Bits/Inch:9550
Three5.25”platters
Bandwidth:
0.6MBytes/secLatency:48.3msCache:noneSeagate373453,200315000RPM (4X)73.4GBytes (2500X)Tracks/Inch:64000 (80X)Bits/Inch:533,000 (60X)Four2.5”platters
(in3.5”formfactor)Bandwidth:
86MBytes/sec (140X)Latency:5.7ms (8X)Cache:8MBytes39LatencyLagsBandwidth(forlast~20years)PerformanceMilestonesDisk:3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)40Memory:Archaic(Nostalgic)v.Modern(Newfangled)1980DRAM
(asynchronous)0.06Mbits/chip64,000xtors,35mm216-bitdatabuspermodule,16pins/chip13Mbytes/secLatency:225ns(noblocktransfer)2000
DoubleDataRateSynchr.
(clocked)DRAM256.00Mbits/chip (4000X)256,000,000xtors,204mm264-bitdatabusper
DIMM,66pins/chip (4X)1600Mbytes/sec (120X)Latency:52ns (4X)Blocktransfers(pagemode)41LatencyLagsBandwidth(last~20years)PerformanceMilestones
MemoryModule:16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:
3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)42LANs:Archaic(Nostalgic)v.Modern(Newfangled)Ethernet802.3
YearofStandard:197810Mbits/s
linkspeedLatency:3000msecSharedmediaCoaxialcableEthernet802.3ae
YearofStandard:200310,000Mbits/s (1000X)
linkspeedLatency:190msec (15X)SwitchedmediaCategory5copperwireCoaxialCable:CoppercoreInsulatorBraidedouterconductorPlasticCoveringCopper,1mmthick,
twistedtoavoidantennaeffectTwistedPair:"Cat5"is4twistedpairsinbundle43LatencyLagsBandwidth(last~20years)PerformanceMilestones
Ethernet:10Mb,100Mb,1000Mb,10000Mb/s(16x,1000x)MemoryModule:
16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:
3600,5400,7200,10000,15000RPM(8x,143x)(latency=simpleoperationw/ocontentionBW=best-case)44CPUs:Archaic(Nostalgic)v.Modern(Newfangled)1982Intel8028612.5MHz2MIPS(peak)Latency320ns134,000xtors,47mm216-bitdatabus,68pinsMicrocodeinterpreter,
separateFPUchip(nocaches)
2001IntelPentium4
1500
MHz (120X)4500MIPS(peak) (2250X)Latency15ns (20X)42,000,000xtors,217mm264-bitdatabus,423pins3-waysuperscalar,
DynamictranslatetoRISC,Superpipelined(22stage),
Out-of-OrderexecutionOn-chip8KBDatacaches,
96KBInstr.Tracecache,
256KBL2cache45LatencyLagsBandwidth(last~20years)PerformanceMilestonesProcessor:‘286,‘386,‘486,Pentium,PentiumPro,Pentium4(21x,2250x)Ethernet:10Mb,100Mb,1000Mb,10000Mb/s(16x,1000x)MemoryModule:16bitplainDRAM,PageModeDRAM,32b,64b,SDRAM,
DDRSDRAM(4x,120x)Disk:3600,5400,7200,10000,15000RPM(8x,143x)CPUhigh,Memorylow
(“MemoryWall”)46RuleofThumbforLatencyLaggingBWInthetimethatbandwidthdoubles,latencyimprovesbynomorethanafactorof1.2to1.4
(andcapacityimprovesfasterthanbandwidth)Statedalternatively:
BandwidthimprovesbymorethanthesquareoftheimprovementinLatency
476ReasonsLatency
LagsBandwidth1. Moore’sLawhelpsBWmorethanlatencyFastertransistors,moretransistors,
morepinshelpBandwidthMPUTransistors: 0.130vs.42Mxtors (300X)DRAMTransistors: 0.064vs.256Mxtors (4000X)MPUPins: 68vs.423pins
(6X)DRAMPins: 16vs.66pins
(4X)Smaller,fastertransistorsbutcommunicate
over(relatively)longerlines:limitslatency
Featuresize: 1.5to3vs.0.18micron (8X,17X)MPUDieSize: 35vs.204mm2 (ratiosqrt2X)DRAMDieSize: 47vs.217mm2 (ratiosqrt2X)486ReasonsLatency
LagsBandwidth(cont’d)
2.Distancelimitslatency
SizeofDRAMblock
longbitandwordlines
mostofDRAMaccesstimeSpeedoflightandcomputersonnetwork1.&2.explainslinearlatencyvs.squareBW?3. Bandwidtheasiertosell(“bigger=better”)E.g.,10Gbits/sEthernet(“10Gig”)vs.
10mseclatencyEthernet4400MB/sDIMM(“PC4400”)vs.50nslatencyEvenifjustmarketing,customersnowtrainedSincebandwidthsells,moreresourcesthrownatbandwidth,whichfurthertipsthebalance496ReasonsLatency
LagsBandwidth(cont’d)
4. LatencyhelpsBW,butnotviceversa
Spinningdiskfasterimprovesbothbandwidthandrotationallatency
3600RPM15000RPM=4.2XAveragerotationallatency:8.3ms2.0msThingsbeingequal,alsohelpsBWby4.2XLowerDRAMlatency
Moreaccess/second(higherbandwidth)HigherlineardensityhelpsdiskBW
(andcapacity),butnotdiskLatency9,550BPI533,000BPI
60XinBW506ReasonsLatency
LagsBandwidth(cont’d)
5.BandwidthhurtslatencyQueueshelpBandwidth,hurtLatency(QueuingTheory)AddingchipstowidenamemorymoduleincreasesBandwidthbuthigherfan-outonaddresslinesmayincreaseLatency6.OperatingSystemoverheadhurts
LatencymorethanBandwidthLongmessagesamortizeoverhead;
overheadbiggerpartofshortmessages51Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls52Defineandquantitypower(1/2)ForCMOSchips,traditionaldominantenergyconsumptionhasbeeninswitchingtransistors,calleddynamicpower:Formobiledevices,energybettermetricForafixedtask,slowingclockrate(frequencyswitched)reducespower,butnotenergyCapacitiveloadafunctionofnumberoftransistorsconnectedtooutputandtechnology,whichdeterminescapacitanceofwiresandtransistorsDroppingvoltagehelpsboth,sowentfrom5Vto1VTosaveenergy&dynamicpower,mostCPUsnowturnoffclockofinactivemodules(e.g.Fl.Pt.Unit)53ExampleofquantifyingpowerSuppose15%reductioninvoltageresultsina15%reductioninfrequency.Whatisimpactondynamicpower?54Defineandquantitypower(2/2)Becauseleakagecurrentflowsevenwhenatransistorisoff,nowstaticpowerimportanttooLeakagecurrentincreasesinprocessorswithsmallertransistorsizesIncreasingthenumberoftransistorsincreasespowereveniftheyareturnedoffIn2006,goalforleakageis25%oftotalpowerconsumption;highperformancedesignsat40%Verylowpowersystemsevengatevoltagetoinactivemodulestocontrollossduetoleakage55Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls56CostofIntegratedCircuitsdependsofseveralfactors:Time:Thepricedropswithtime,learningcurveincreasesVolume:ThepricedropswithvolumeincreaseCommodities:ManymanufacturersproducethesameproductCompetitionbringspricesdown57ThepriceofIntelPentium4andPentiumM58AMDOpteronMicroprocessorDie59A300mmsiliconwafercontains117AMDOpteronmicroprocessorchipsina90nmprocess60Costofintegratedcircuit=Costofdie+Costoftestingdie+CostofPackagingandfinalTestFinalTestYieldCostofdie=CostofWaferDiesperwaferXDieyield61Diesperwafer=PiXWaferDiameterSqrt(2XDiearea)Example:WaferDiameter=300mmDiearea=1.5cmX1.5cm=2.25cm^2Diesperwafer=270PiX(WaferDiameter/2)^2Diearea-62Dieyield=DefectsperunitareaXDieareaaWaferyieldX(1+)-aWaferyield:measureshowmanywafersarecompletelybada=4Empiricalformulacorrespondstomaskinglevelsinmanufacturingprocess63Example:Diearea=1.5cmX1.5cm=2.25cm^2Dieyield=0.44Defectdensity=0.4percm^2Diearea=1.0cmX1.0cm=1cm^2Dieyield=0.68Smallerdieareagivesmoredieyield64Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependability
PerformanceFallaciesandPitfalls65Defineandquantitydependability(1/3)Howdecidewhenasystemisoperatingproperly?InfrastructureprovidersnowofferServiceLevelAgreements(SLA)toguaranteethattheirnetworkingorpowerservicewouldbedependableSystemsalternatebetween2statesofservicewithrespecttoanSLA:Serviceaccomplishment,wheretheserviceisdeliveredasspecifiedinSLAServiceinterruption,wherethedeliveredserviceisdifferentfromtheSLAFailure=transitionfromstate1tostate2Restoration=transitionfromstate2tostate166Defineandquantitydependability(2/3)Modulereliability=measureofcontinuousserviceaccomplishment(ortimetofailure).
2metricsMeanTimeToFailure(MTTF)measuresReliabilityFailuresInTime(FIT)=1/MTTF,therateoffailuresTraditionallyreportedasfailuresperbillionhoursofoperationMeanTimeToRepair(MTTR)measuresServiceInterruptionMeanTimeBetweenFailures(MTBF)=MTTF+MTTRModuleavailabilitymeasuresserviceasalternatebetweenthe2statesofaccomplishmentandinterruption(numberbetween0and1,e.g.0.9)Moduleavailability=MTTF/(MTTF+MTTR)67ExamplecalculatingreliabilityIfmoduleshaveexponentiallydistributedlifetimes(ageofmoduledoesnotaffectprobabilityoffailure),overallfailurerateisthesumoffailureratesofthemodulesCalculateFITandMTTFfor10disks(1MhourMTTFperdisk),1diskcontroller(0.5MhourMTTF),and1powersupply(0.2MhourMTTF):68Outline:IntroductionQuantitativePrinciplesofComputerDesignClassesofComputersComputerArchitectureTrendsinTechnologyPowerinIntegratedCircuitsTrendsinCostDependabilityPerformanceFallaciesandPitfalls6970HowtoQuantifyPerformance?Timetorunthetask(ExTime)Executiontime,responsetime,latencyTasksperday,hour,week,sec,ns…(Performance)Throughput,bandwidthPlaneBoeing747BAD/SudConcodreSpeed610mph1350mphDCtoParis6.5hours3hoursPassengers470132Throughput(pmph)286,700178,200Definition:Performance Performance(X) Execution_time(Y) n= = Performance(Y) Execution_time(X)PerformanceisinunitsofthingspersecbiggerisbetterIfweareprimarilyconcernedwithresponsetime1 execution_time(x)"XisntimesfasterthanY"means:performance(x)=71Performance:WhattomeasureUsuallyrelyonbenchmarksvs.realworkloadsToincreasepredictability,collectionsofbenchmarkapplications,calledbenchmarksuites,arepopularSPECCPU:populardesktopbenchmarksuiteCPUonly,splitbetweenintegerandfloatingpointprogramsSPECint2000has12integer,SPECfp2000has14integerpgmsSPECCPU2006tobeannouncedSpring2006SPECSFS(NFSfileserver)andSPECWeb(WebServer)addedasserverbenchmarksTransactionProcessingCouncilmeasuresserverperformanceandcost-performancefordatabasesTPC-CComplexqueryforOnlineTransactionProcessingTPC-HmodelsadhocdecisionsupportTPC-WatransactionalwebbenchmarkTPC-Appapplicationserverandwebservicesbenchmark7273SPEC:SystemPerformanceEvaluationCooperativeFirstRound198910programsyieldingasinglenumber(“SPECmarks”)SecondRound1992SPECInt92(6integerprograms)andSPECfp92(14floatingpointprograms)CompilerFlagsunlimited.March93newsetofprograms:SPECint95(8integerprograms)andSPECfp95(10floatingpoint)“benchmarksusefulfor3years”Singleflagsettingforallprograms:SPECint_base95,SPECfp_base95
SPECCPU2000(11integerbenchmarks–CINT2000,and14floating-pointbenchmarks–CFP2000NormalizedExecutionTimeNormalizeexecutiontimetoareferencemachineTwocommonmethodArithmeticmeanGeometricmeanComparisonArithmeticmeanUsetopredictperformanceMaynotbeconsistentGeometricmeanIndependentoftherunningtimesoftheindividualprogramsCannotbeusedtopredictrelativeexecutiontimeforaworkload4.5NormalizedExecutionTime–ExampleTimeonATimeonBNormalizedtoANormalizedtoBABABProgram111011
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 粮食配送合同范本
- 2024年度中介服务合同(标的:二手房交易)
- 南山区代理记账合同范本
- 二零二四年度文化传媒项目投资合同
- 二零二四年度艺人经纪与肖像权使用合同
- 二零二四年风力发电项目投资建设合同
- 民间理财合同范本
- 2024年度电脑租赁市场推广合同3篇
- 2024年度网络安全防护技术研发合同
- 2024年度私募股权代理及贷款合同
- 幼儿园绘本故事:《我爱我的脏鞋子》
- 突破高中艺体生的数学成绩
- YZP系列冶金及起重用变频调速三相异步电动机
- 《中国音乐分类》PPT课件
- 第7章墨水中的流变特性及流变调节剂
- 抽凝机组原则性热力系统计算
- 人体关节活动度测量表
- 华科财务管理考试复习整理(含哈丁案例完美分析)
- 国开(电大)《岩土力学》形考任务1-12参考答案
- 打捞施工方案(定稿)
- 1178046664_政协提案报告
评论
0/150
提交评论