




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1计算机体系结构
cachememory
2次课2LocalityPrincipleofLocality:
ProgramstendtousedataandinstructionswithaddressesnearorequaltothosetheyhaveusedrecentlyTemporallocalityRecentlyreferenceditemsarelikely
tobereferencedagaininthenearfutureSpatiallocalityItemswithnearbyaddressestend
tobereferencedclosetogetherintime3LocalityAlllevelsofmoderncomputersystemsaredesignedtoexploitlocalityHardwareCachememory(tospeedupmainmemoryaccesses)OperatingsystemsUsemainmemorytospeedupvirtualaddressspaceaccessesUsemainmemorytospeedupdiskfileaccessesApplicationprogramsWebbrowsersexploittemporallocalitybycachingrecentlyreferenceddocumentsonalocaldisk4Localityintsumvec(intv[N]){ inti,sum=0;
for(i=0;i<N;i++) sum+=v[i]; returnsum;}Address0481216202428Contentsv0v1v2v3v4v5v6v7Accessorder123456785LocalityLocalityintheexamplesum:temporallocalityv:spatiallocalityStride-1referencepatternStride-kreferencepatternVisitingeveryk-thelementofacontiguousvectorAsthestrideincreases,thespatiallocalitydecreases6Stride-1referencepatternintsumarrayrows(inta[M][N]){ inti,j,sum=0;
for(i=0;i<M;i++) for(j=0;j<N;j++) sum+=a[i][j]; returnsum;}Address048121620Contentsa00a01a02a10a11a12Accessorder1234567Stride-Nreferencepatternintsumarraycols(inta[M][N])(M=2,N=3){ inti,j,sum=0;
for(j=0;j<N;j++) for(i=0;i<M;i++) sum+=a[i][j]; returnsum;}Address048121620Contentsa00a01a02a10a11a12Accessorder1352468LocalityLocalityoftheinstructionfetchSpatiallocalityInmostcases,programsareexecutedinsequentialorderTemporallocalityInstructionsinloopsmaybeexecutedmanytimes9LocalityDatareferencesReferencearrayelementsinsuccession SpatiallocalityReferencevariablesumeachiteration TemporallocalityInstructionreferencesReferenceinstructionsinsequence. SpatiallocalityCyclethroughlooprepeatedly Temporallocalitysum=0;for(i=0;i<n;i++) sum+=a[i];returnsum;10MemoryHierarchyFundamentalpropertiesofstoragetechnologyandcomputersoftwareDifferentstoragetechnologieshavewidelydifferentaccesstimesFastertechnologiescostmoreperbytethansloweronesandhavelesscapacityThegapbetweenCPUandmainmemoryspeediswideningWell-writtenprogramstendtoexhibitgoodlocality11Mainmemoryholdsdiskblocksretrievedfromlocaldisks.registerson-chipL1cache(SRAM)mainmemory(DRAM)localsecondarystorage(localdisks)Larger,slower,andcheaper(perbyte)storagedevicesremotesecondarystorage(distributedfilesystems,Webservers)Localdisksholdfilesretrievedfromdisksonremotenetworkservers.off-chipL2cache(SRAM)L1cacheholdscachelinesretrievedfromtheL2cache.CPUregistersholdwordsretrievedfromcachememory.L2cacheholdscachelinesretrievedfrommemory.L0:L1:L2:L3:L4:L5:Smaller,faster,andcostlier(perbyte)storagedevicesAnexamplememoryhierarchyCachesFundamentalideaofamemoryhierarchy:ForeachK,thefaster,smallerdeviceatlevelKservesasacacheforthelarger,slowerdeviceatlevelK+1.Whydomemoryhierarchieswork?Becauseoflocality,programstendtoaccessthedataat
levelkmoreoftenthantheyaccessthedataatlevelk+1.Thus,thestorageatlevelk+1canbeslower,andthuslargerandcheaperperbit.BigIdea:Thememoryhierarchycreatesalargepoolofstoragethatcostsasmuchasthecheapstoragenearthebottom,butthatservesdatatoprogramsattherateofthefaststoragenearthetop.GeneralCacheConcepts012345678910111213141589143CacheLarger,slower,cheapermemoryviewedaspartitionedinto“blocks”Dataiscopiedinblock-sizedtransferunitsSmaller,faster,moreexpensivememorycachesasubsetoftheblocks44410101014MemoryHierarchyBlocksAtlevelk+1ThestorageispartitionedintocontiguouschunksofdataobjectsEachblockhasauniqueaddressornameBlockscanbefixed-sizeorvariable-sizedAtlevelkThestorageispartitionedintoasmallersetofblocksTheblocksarethesamesizeastheblocksatlevelk+1Thestoragecontainscopiesofasubsetoftheblocksatlevelk+115MemoryHierarchyTransferunitsUsedtocopydatabackandforthbetweenlevelkandlevelk+1GeneralCacheConcepts:Hit012345678910111213141589143CacheDatainblockbisneededRequest:1414Blockbisincache:Hit!GeneralCacheConcepts:Miss012345678910111213141589143CacheDatainblockbisneededRequest:12Blockbisnotincache:Miss!BlockbisfetchedfrommemoryRequest:12121212BlockbisstoredincachePlacementpolicy:
determineswherebgoesReplacementpolicy:
determineswhichblock
getsevicted(victim)TypesofCacheMissesCold(compulsory)missColdmissesoccurbecausethecacheisempty.CapacitymissOccurswhenthesetofactivecacheblocks(workingset)islargerthanthecache.TypesofCacheMissesConflictmissMostcacheslimitblocksatlevelk+1toasmallsubset(sometimesasingleton)oftheblockpositionsatlevelk.e.g.Blockiatlevelk+1mustbeplacedinblock(imod4)atlevelk.Conflictmissesoccurwhenthelevelkcacheislargeenough,butmultipledataobjectsallmaptothesamelevelkblock.e.g.Referencingblocks0,8,0,8,0,8,...wouldmisseverytime.ExamplesofCachingintheHierarchyHardware0On-ChipTLBAddresstranslationsTLBWebbrowser10,000,000LocaldiskWebpagesBrowsercacheWebcacheNetworkbuffercacheBuffercacheVirtualMemoryL2cacheL1cacheRegistersCacheTypeWebpagesPartsoffilesPartsoffiles4-KBpage64-bytesblock64-bytesblock4-8byteswordsWhatisCached?Webproxyserver1,000,000,000RemoteserverdisksOS100MainmemoryHardware1On-ChipL1Hardware10On/Off-ChipL2AFS/NFSclient10,000,000LocaldiskHardware+OS100MainmemoryCompiler0CPUcoreManagedByLatency(cycles)WhereisitCached?Diskcache DisksectorsDiskcontroller100,000Diskfirmware21CacheMemorymainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememoryLiesbetweentheCPUandthemainmemoryCPUlooksfirstfordataincache,theninmainmemoryHoldfrequentlyaccessedblocksofmainmemoryincaches22CacheMemoryHistoryAtverybeginning,3levelsRegisters,mainmemory,storage10yearslater,4levelsRegister,cachememory,mainmemory,storageModernprocessor,5~6levelsRegisters,SRAML1,L2(,L3)cache,mainDRAMmemory,diskstorageCachememoriessmall,fastSRAM-basedmemoriesmanagedbyhardwareautomaticallycanbeon-chip,off-chip23InsertinganL1cachebetweentheCPUandmainmemoryabcdblock10pqrsblock21......wxyzblock30...Thebigslowmainmemoryhasroomformany8-wordblocks.ThesmallfastL1cachehasroomfortwo8-wordblocks.Thetiny,veryfastCPUregisterfilehasroomforfour4-bytewords.Thetransferunitbetweenthecacheandmainmemoryisa8-wordblock(32bytes).ThetransferunitbetweentheCPUregisterfileandthecacheisa4-byteblock.line0line124GenericCacheMemoryOrganization•
•
•B–110•
•
•B–110validvalidtagtagset0:B=2bbytespercacheblockElinespersetS=2ssetsttagbitsperline1validbitperline•
•
••
•
•B–110•
•
•B–110validvalidtagtagset1:•
•
••
•
•B–110•
•
•B–110validvalidtagtagsetS-1:•
•
••
•
•Cacheisanarrayofsets.Eachsetcontainsoneormorelines.Eachlineholdsablockofdata.25CacheMemoryFundamentalparametersParametersDescriptionsS=2sEB=2bm=log2(M)NumberofsetsNumberoflinespersetBlocksize(bytes)Numberofphysical(mainmemory)addressbits26CacheMemoryDerivedquantitiesParametersDescriptionsM=2ms=log2(S)b=log2(B)t=m-(s+b)C=BESMaximumnumberofuniquememoryaddressNumberofsetindexbitsNumberofblockoffsetbitsNumberoftagbitsCachesize(bytes)notincludingoverheadsuchasthevalidandtagbits27MemoryAccessingForamemoryaccessinginstructionmovlA%eaxAccesscachebyAdirectlyIfcachehitgetthevaluefromthecacheOtherwise,cachemisshandlinggetthevalue28Addressingcachestbitssbitsbbits0m-1<tag><setindex><blockoffset>PhysicalAddressA:0m-1Splitinto3parts:29Direct-mappedcacheSimplestkindofcacheCharacterizedbyexactlyonelineperset.validvalidvalidtagtagtag•
•
•set0:set1:setS-1:E=1linespersetcacheblockcacheblockcacheblockp63330AccessingDirect-MappedCachesThreestepsSetselectionLinematchingWordextraction31SetselectionUsethesetindexbitstodeterminethesetofinterestvalidvalidvalidtagtagtag•
•
•set0:set1:setS-1:tbitssbits000010m-1bbitstagsetindexblockoffsetselectedsetcacheblockcacheblockcacheblock32Linematching1tbitssbitsi01100m-1bbitstagsetindexblockoffsetselectedset(i):=1?=?(1)Thevalidbitmustbeset(2)Thetagbitsinthecachelinemustmatchthetagbitsintheaddress011030127456Findavalidlineintheselectedsetwithamatchingtag33WordExtraction1tbitssbits100i01100m-1bbitstagsetindexblockoffsetselectedset(i):blockoffsetselectsstartingbyte0110w3w0w1w23012745634SimpleMemorySystemCacheCache16lines4-bytelinesizeDirectmapped11109876543210OffsetIndexTag35SimpleMemorySystemCacheIdxTagValidB0B1B2B30191991123111150––––21B1000204083360––––4321436D8F0950D13672F01D6310––––716111C2DF0336SimpleMemorySystemCacheIdxTagValidB0B1B2B382413A00518992D0––––A2D19315DA3BB0B0––––C120––––D16104963415E13183771BD3F140––––37AddressTranslationExampleAddress:0x35411109876543210OffsetIndexTag001010101100Offset:0x0 Index:0x05Tag:0x0D
Hit?YesByte:0x3638LineReplacementonMissesCheckthecachelineofthesetindicatedbythesetindexbitsIfthecachelinevalid,itmustbeevictedRetrievetherequestedblockfromthenextlevelHowtogettheblock?Currentlineisreplacedbythenewlyfetchedline39Checkthecacheline1selectedset(i):=1?Ifvalidbitisset,evictthelineTag3012745640GettheAddressoftheStartingByteConsidermemoryaddresslookslikethefollowingClearthelastbitsandgettheaddressA0m-1<tag><setindex><blockoffset>0m-1<tag><setindex><blockoffset>xxx………xxx000………00041ReadtheBlockfromtheMemoryPutAontheBus(AisA’000)A0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory42ReadtheBlockfromtheMemoryMainmemoryreadsA’fromthememorybus,retrieves8bytesx,andplacesitonthebusX0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory43ReadtheBlockfromtheMemoryCPUreadsxfromthebusandcopiesitintothecacheline0A’xmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory44ReadtheBlockfromtheMemoryIncreaseA’by1,andcopyYinA’+1intothecacheline.0A’+1YmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory45ReadtheBlockfromtheMemoryRepeatseveraltimes(4or8)0A’+3WmainmemoryI/ObridgebusinterfaceALUregisterfileCPUchipsystembusmemorybusCachememory46Cacheline,setandblockBlockAfixed-sizedpacketofinformationMovesbackandforthbetweenacacheandmainmemory(oralower-levelcache)LineAcontainerinacachethatstoresAblock,thevalidbit,thetagbitsOtherinformationSetAcollectionofoneormorelines47Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]xt=1s=2b=1xxx0vtagblock0000123set48Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]
missxt=1s=2b=1xxx10m[0]m[1]vtagblock000
0[0000](miss)(1)0123set49Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitxt=1s=2b=1xxx10m[0]m[1]vtagblock000
1[0001](hit)(2)0123set50Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissxt=1s=2b=1xxx10m[0]m[1]vtagblock011m[12]m[13]0
13[1101](miss)(3)0123set51Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmissxt=1s=2b=1xxx11m[8]m[9]vtagblock011m[12]m[13]0
8[1000](miss)(4)0123set52Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmiss
missxt=1s=2b=1xxx10m[0]m[1]vtagdata011m[12]m[13]0
0[0000](miss)(5)0123set53Direct-mappedcachesimulationExampleM=16byteaddressesB=2bytes/block,S=4sets,E=1entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmiss
missxt=1s=2b=1xxx10m[0]m[1]vtagdata011m[12]m[13]0
0[0000](miss)(5)0123setThrashing!54ConflictMissesinDirect-MappedCaches1 floatdotprod(floatx[8],floaty[8])2 {3 floatsum=0.0;4 inti;56 for(i=0;i<8;i++)7 sum+=x[i]*y[i];8 returnsum;9 }55ConflictMissesinDirect-MappedCachesAssumptionforxandyxisloadedintothe32bytesofcontiguousmemorystartingataddress0ystartsimmediatelyafterxataddress32AssumptionforthecacheAblockis16bytesbigenoughtoholdfourfloatsThecacheconsistsoftwosetsAtotalcachesizeof32bytesThelast6bitsoftheaddressofx[0]is000000Thelast6bitsoftheaddressofx[4]is010000Thelast6bitsoftheaddressofy[0]is100000Thelast6bitsoftheaddressofy[4]is11000056ConflictMissesinDirect-MappedCachesTrashingReadx[0]willloadx[0]~x[3]intothecacheReady[0]willoverloadthecachelinebyy[0]~y[3]57ConflictMissesinDirect-MappedCachesPaddingcanavoidthrashingClaimx[12]insteadofx[8]Thelast6bitsofY[0]is11000058Whyusemiddlebitsasindex?High-OrderBitIndexingAdjacentmemorylineswouldmaptosamecachesetPooruseofspatiallocalityMiddle-OrderBitIndexingConsecutivememorylinesmaptodifferentcachelinesCanholdC-byteregionofaddressspaceincacheatonetime59Direct-mappedcachesimulationAddressbitsAddress(decimal)Tagbits(t=1)Indexbits(s=2)Offsetbits(b=1)Setnumber(decimal)01234567891011121314150000000011111111000001011010111100000101101011110101010101010101001122330011223360Direct-mappedcachesimulationAddressbitsAddress(decimal)Indexbits(s=2)Tagbits(t=1)Offsetbits(b=1)Setnumber(decimal)01234567891011121314150000000001010101101010101111111100110011001100110101010101010101000011112222333361Whyusemiddlebitsasindex?4-lineCacheHigh-OrderBitIndexingMiddle-OrderBitIndexing000110110000000100100011010001010110011110001001101010111100110111101111000000010010001101000101011001111000100110101011110011011110111162SetassociativecachesCharacterizedbymorethanonelinepersetvalidtagset0:E=2linespersetset1:setS-1:•
•
•cacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblockvalidtagcacheblock63AccessingsetassociativecachesSetselectionidenticaltodirect-mappedcachevalidvalidtagtagset0:validvalidtagtagset1:validvalidtagtagsetS-1:•
•
•tbitssbits000010m-1bbitstagsetindexblockoffsetSelectedsetcacheblockcacheblockcacheblockcacheblockcacheblockcacheblock64AccessingsetassociativecachesLinematchingandwordselectionmustcomparethetagineachvalidlineintheselectedset.(3)If(1)and(2),thencachehit,andblockoffsetselectsstartingbyte.10110w3w0w1w211001tbitssbits100i01100m-1bbitstagsetindexblockoffsetselectedset(i):=1?=?(2)Thetagbitsinoneofthecachelinesmustmatchthetagbitsintheaddress(1)Thevalidbitmustbeset.3012745665AssociativeCacheCache16lines4-bytelinesize2-waysetassociative11109876543210OffsetIndexTag66SimpleMemorySystemCacheIdxTagValidB0B1B2B30191991123110150––––11B1000204081360––––2321436D8F0920D13672F01D3310––––316111C2DF0367SimpleMemorySystemCacheIdxTagValidB0B1B2B342413A00518942D0––––52D19315DA3B50B0––––6120––––616104963415713183771BD37140––––68AddressTranslationExampleAddress:0x35411109876543210OffsetIndexTag001010101100Offset:0x0 Index:0x05Tag:0x1A
Hit?No
69LineReplacementonMissesIfallthecachelinesofthesetarevalidWhichlineisselectedtobeevicted?LFU(least-frequently-used)ReplacethelinethathasbeenreferencedthefewesttimesoversomepasttimewindowLRU(least-recently-used)ReplacethelinethatwaslastaccessedthefurthestinthepastAllofthesepoliciesrequireadditionaltimeandhardware70SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]xxt=2s=1b=1xx0vtagblock0000011set71SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads): 0[0000]1[0001]13[1101]8[1000]0[0000]
miss100m[0]m[1]vtagblock000
0[0000](miss)(1)xxt=2s=1b=1xx72SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hit100m[0]m[1]vtagblock000
1[0001](hit)(2)xxt=2s=1b=1xx0011set73SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmiss100m[0]m[1]vtagblock1111m[12]m[13]0
13[1101](miss)(3)xxt=2s=1b=1xx0011set74SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmiss110m[8]m[9]vtagblock111m[12]m[13]10
8[1000](miss)LRU(4)xxt=2s=1b=1xx0011set75SetAssociativeCacheSimulationExampleM=16byteaddressesB=2bytes/block,S=2sets,E=2entry/setAddresstrace(reads):
0[0000]1[0001]13[1101]8[1000]0[0000]
miss
hitmissmiss
miss110m[8]m[9]vtagblock100m[0]m[1]10
0[0000](miss)LRU(5)xxt=2s=1b=1xx0011set76FullyassociativecachesCharacterizedbyallofthelinesintheonlyonesetNosetindexbitsintheaddressset0:validvalidtagtagcacheblockcacheblockvalidtagcacheblock…E=C/Blinesintheoneandonlysettbitsbbitstagblockoffset77AccessingfullyassociativecachesLinematchandwordselectionmustcomparethetagineachvalidline00110w3w0w1w211001tbits10001100m-1bbitstagblockoffset=1?=?(3)If(1)and(2),thencachehit,andblockoffsetselectsstartingbyte.(2)Thetagbitsinoneofthecachelinesmustmatchthetagbitsintheaddress(1)Thevalidbitmustbeset.30127456100110111078IssueswithWritesWritehitsWritethroughCacheupdatesitscopyImmediatelywritesthecorrespondingcacheblocktomemoryWritebackDefersthememoryupdateaslongaspossibleWritingtheupdatedblocktomemoryonlywhenitisevictedfromthecacheMaintainsadirtybitforeachcacheline79IssueswithWritesWritemissesWrite-allocateLoadsthecorrespondingmemoryblockintothecacheThenupdatesthecacheblockNo-write-allocateBypassesthecacheWritestheworddirectlytomemoryCombinationWritethrough,no-write-allocateWriteback,write-allocate(modernimplementation)80Multi-levelcachesL1d-cache,i-cache32k8-wayAccess:4cyclesL2unified-cache256k8-wayAccess:11cyclesL3unified-cache8M16-wayAccess:30~40cyclesBlocksize64bytesforallcache81CacheperformancemetricsMissRatefractionofmemoryreferencesnotfoundincache(misses/references)Typicalnumbers:3-10%forL1Canbequitesmall(<1%)forL2,dependingonsizeHitRatefractionofmemoryreferencesfoundincache(1-missrate)82CacheperformancemetricsHitTimetimetodeliveralineinthecachetotheprocessor(includestimetodeterminewhetherthelineisinthecache)Typicalnumbers:1-2clockcyclesforL1(4cyclesincorei7)5-10clockcyclesforL2(11cyclesincorei7)MissPenaltyadditionaltimerequiredbecauseofamissTypically50-200cyclesformainmemory(Trend:increasing!)83WhatdoesHitRateMean?ConsiderHitTime:2cyclesMissPenalty:200cyclesAverageaccesstime:Hitrate99%:2*0.99+200*0.01=4cyclesHitrate97%:2*0.97+200*0.03=8cyclesThisiswhy“missrate”isusedinsteadof“hitrate”84CacheperformancemetricsCachesizeHitratevs.hittimeBlocksizeSpatiallocalityvs.temporallocalityAssociativityThrashingCostSpeedMisspenaltyWritestrategySimple,readmisses,fewertransfer85WritingCache-FriendlyCodePrinciplesProgramswithbetterlocalitywilltendtohavelowermissratesProgramswithlowermissrateswilltendtorunfasterthanprogramswithhighermissrates86WritingCache-FriendlyCodeBasicapproachMakethecommoncasegofastProgramsoftenspendmostoftheirtimeinafewcorefunctions.ThesefunctionsoftenspendmostoftheirtimeinafewloopsMinimizethenumberofcachemissesineachinnerloop87WritingCache-FriendlyCode8[h]7[h]6[h]5[m]4[h]3[h]2[h]1[m]Accessorder,[h]itor[m]issi=7i=6i=5i=4i=3i=2i=1i=0v[i]Temporallocality,Thesevariablesareusuallyputinregisters(pp.650)intsumvec(intv[N]){ inti,sum=0;
for(i=0;i<N;i++) sum+=v[i]; returnsum;}Assumption:WordsarebytesCacheblocksare4wordsVisblockaligned88Writingcache-friendlycodeTemporallocalityRepeatedreferencestolocalvariablesaregoodbecausethecompilercancachethemintheregisterfile89Writingcache-friendlycodeSpatiallocalityStride-1referencespatternsaregoodbecausecachesatalllevelsofthememoryhierarchystoredataascontiguousblocksSpatiallocalityisespeciallyimportantinprogramsthatoperateonmultidimensionalarrays90Writingcache-friendlycodeExample(pp.651,M=4,N=8)intsumarrayrows(inta[M][N]){ inti,j,sum=0;
for(i=0;i<M;i++) for(j=0;j<N;j++) sum+=a[i][j]; returnsum;}Assumption:Sameasthepreviousone91Writingcache-friendlycodea[i][j]j=0j=1j=2j=3j=4j=5j=6j=7i=0i=1i=2i=31[m]9[m]17[m]25[m]2[h]10[h]18[h]26[h]3[h]11[h]19[h]27[h]4[h]12[h]20[h]28[h]5[m]13[m]21[m]29[m]6[h]14[h]22[h]30[h]7[h]15[h]23[h]31[h]8[h]16[h]24[h]32[h]92Writingcache-friendlycodeExample(pp.651,M=4,N=8)
intsumarraycols(inta[M][N]){ inti,j,sum=0;
for(j=0;j<N;j++) for(i=0;i<M;i++) sum+=a[i][j]; returnsum;}Assumption:SameasthepreviousoneArraysizeislargerthanthecachesize93Writingcache-friendlycodea[i][j]j=0j=1j=2j=3j=4j=5j=6j=7i=0i=1i=2i=31[m]2[m]3[m]4[m]5[m]6[m]7[m]8[m]9[m]10[m]11[m]12[m]13[m]14[m]15[m]16[m]17[m]18[m]19[m]20[m]21[m]22[m]23[m]24[m]25[m]26[m]27[m]28[m]29[m]30[m]31[m]32[m]94TheMemoryMountain95TheMemoryMountainReadthroughput(readbandwidth)TheratethataprogramreadsdatafromthememorysystemMemorymountainReaddatafromanarrayAtwo-dimensionalfunctionofreadbandwidthismeasuredCharacterizesthecapabilitiesofthememorysystemforeachcomputer96WorkingSetDimension97AccessStrideDimension98TheMemoryMountainDataSizeMAXBYTES(64M)bytesorMAXELEMS(8M)doublesPartiallyaccessedWorkingset:from64MBto2KBStride:from1to6499Memorymountainmainroutine/*mountain.c-Generatethememorymountain.*/#defineMINBYTES(1<<11)/*Workingsetsizerangesfrom2KB*/#defineMAXBYTES(1<<26)/*...upto64MB*/#defineMAXSTRIDE64/*Stridesrangefrom1to64*/#defineMAXELEMSMAXBYTES/sizeof(data)doubledata[MAXELEMS];/*Thearraywe'llbetraversing*/100Memorymountainmainroutineintmain(){intsize; /*Workingsetsize(inbytes)*/intstride; /*Stride(inarrayelements)*/doubleMhz; /*Clockfrequency*/init_data(data,MAXELEMS);/*Initializeeachelementindatato1*/Mhz=mhz(0); /*Estimatetheclockfrequency*/101Memorymountainmainroutinefor(size=MAXBYTES;size>=MINBYTES;size>>=1){ for(stride=1;stride<=MAXSTRIDE;stride++) printf("%.1f
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年台历购销合同书模板
- 2025房产长期租赁合同
- 关爱自闭症儿童培训会
- 2025建筑工程围挡施工合同
- 股份制企业运营策略及文书编写指南
- 新一代信息技术产业发展趋势分析
- 人力资源行业员工绩效管理方案
- 小学生防拐防骗教学课件
- 创意设计与平面设计作业指导书
- 场馆维修委托协议
- 幼儿园教法与学法
- 《班级植物角我养护》(课件)-二年级上册劳动浙教版
- (已压缩)矿产资源储量技术标准解读300问-1-90
- 古诗《江上渔者》课件
- 韶关市房地产市场调研报告
- 校园诚信教育(课件)-小学生主题班会
- JJF(陕) 065-2021 弯折试验机校准规范
- 电力工程线路交叉跨越施工主要工序及特殊工序施工方法
- 反恐防暴应急知识培训
- 2024-2030年版越南投资环境行业投资分析及未来发展规划研究报告
- 罗汉果行业深度研究与市场前景分析报告
评论
0/150
提交评论