计算机结构英文版课件:Chapter 4 Cache Memory_第1页
计算机结构英文版课件:Chapter 4 Cache Memory_第2页
计算机结构英文版课件:Chapter 4 Cache Memory_第3页
计算机结构英文版课件:Chapter 4 Cache Memory_第4页
计算机结构英文版课件:Chapter 4 Cache Memory_第5页
已阅读5页,还剩60页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、William Stallings Computer Organization and Architecture7th EditionChapter 4Cache Memory 高速缓冲存储器1CHAPTER 4 CACHE MEMORY Key Terms (1)Cache; CACHE MEMORY 高速缓冲存储器Access time; Cache hit; cache line;cache miss;cache setData cache;direct access;Hit ratio;instruction cache;L1 cache;L2 cache;L3 cache;local

2、ity;memory hierarchyLine;slot;2CHAPTER 4 CACHE MEMORY Key Terms (2)Multilevel cache;random access;sequential access;mapping; tag 标签;set-associative associative mappingspatial locality;temporal locality;split cache;unified cache Write back;write once;write through3CharacteristicsLocationCapacityUnit

3、of transferAccess methodPerformancePhysical typePhysical characteristicsOrganisation4LocationCPUInternalExternal5CapacityWord sizeThe natural unit of organisationNumber of wordsor Bytes6Unit of TransferInternalUsually governed by data bus widthExternalUsually a block which is much larger than a word

4、Addressable unitSmallest location which can be uniquely addressedWord internallyCluster on M$ disks7Access Methods (1)SequentialStart at the beginning and read through in orderAccess time depends on location of data and previous locatione.g. tapeDirectIndividual blocks have unique addressAccess is b

5、y jumping to vicinity plus sequential searchAccess time depends on location and previous locatione.g. disk Vicinity visinitin. 附近,邻近8Access Methods (2)RandomIndividual addresses identify locations exactlyAccess time is independent of location or previous accesse.g. RAMAssociativeData is located by a

6、 comparison with contents of a portion of the storeAccess time is independent of location or previous accesse.g. cacheAssociative: 关联的9Memory HierarchyRegistersIn CPUInternal or Main memoryMay include one or more levels of cache“RAM”External memoryBacking store (存储器回填)10Memory Hierarchy - Diagram11P

7、erformanceAccess timeTime between presenting the address and getting the valid dataMemory Cycle timeTime may be required for the memory to “recover” before next accessCycle time is access + recoveryTransfer RateRate at which data can be moved12Physical TypesSemiconductorRAMMagneticDisk & TapeOptical

8、CD & DVDOthersBubble (膜泡 压铸及机械词汇 )Hologram (【物】全息图 )13Physical CharacteristicsDecay (衰减)Volatility (易变性)Erasable (可抹除的)Power consumption (消耗;用尽)14OrganisationPhysical arrangement of bits into wordsNot always obviouse.g. interleaved (交叉,交错)15The Bottom LineHow much?CapacityHow fast?Time is moneyHow e

9、xpensive?16Hierarchy ListRegistersL1 CacheL2 CacheMain memoryDisk cacheDiskOpticalTape cache vt.贮藏 cache n.【电脑】快速缓冲贮存区 17So you want fast?It is possible to build a computer which uses only static RAM (see later)This would be very fastThis would need no cacheHow can you cache cache?This would cost a

10、very large amount18Locality of ReferenceDuring the course of the execution of a program, memory references tend to clustere.g. loopse.g. 【拉】例如Cluster 【电脑】群集器,簇19CacheSmall amount of fast memorySits between normal main memory and CPUMay be located on CPU chip or module20Cache and Main Memory21Cache/M

11、ain Memory Structure22Cache operation overviewCPU requests contents of memory locationCheck cache for this dataIf present, get from cache (fast)If not present, read required block from main memory to cacheThen deliver from cache to CPUCache includes tags to identify which block of main memory is in

12、each cache slot23Cache Read Operation - Flowchart24Cache DesignSizeMapping FunctionReplacement AlgorithmWrite PolicyBlock SizeNumber of Caches25Size does matterCostMore cache is expensiveSpeedMore cache is faster (up to a point)Checking cache for data takes time26Typical Cache Organization27Comparis

13、on of Cache SizesProcessorTypeYear of IntroductionL1 cacheL2 cacheL3 cacheIBM 360/85Mainframe196816 to 32 KBPDP-11/70Minicomputer19751 KBVAX 11/780Minicomputer197816 KBIBM 3033Mainframe197864 KBIBM 3090Mainframe1985128 to 256 KBIntel 80486PC19898 KBPentiumPC19938 KB/8 KB256 to 512 KBPowerPC 601PC199

14、332 KBPowerPC 620PC199632 KB/32 KBPowerPC G4PC/server199932 KB/32 KB256 KB to 1 MB2 MBIBM S/390 G4Mainframe199732 KB256 KB2 MBIBM S/390 G6Mainframe1999256 KB8 MBPentium 4PC/server20008 KB/8 KB256 KBIBM SPHigh-end server/ supercomputer200064 KB/32 KB8 MBCRAY MTAbSupercomputer20008 KB2 MBItaniumPC/ser

15、ver200116 KB/16 KB96 KB4 MBSGI Origin 2001High-end server200132 KB/32 KB4 MBItanium 2PC/server200232 KB256 KB6 MBIBM POWER5High-end server200364 KB1.9 MB36 MBCRAY XD-1Supercomputer200464 KB/64 KB1MB28Mapping FunctionCache of 64kByteCache block of 4 bytesi.e. cache is 16k (214) lines of 4 bytes16MByt

16、es main memory24 bit address (224=16M)29Direct MappingEach block of main memory maps to only one cache linei.e. if a block is in cache, it must be in one specific placeAddress is in two partsLeast Significant w bits identify unique wordMost Significant s bits specify one memory blockThe MSBs are spl

17、it into a cache line field r and a tag of s-r (most significant)30Direct MappingAddress StructureTag s-rLine or Slot rWord w814224 bit address2 bit word identifier (4 byte block)22 bit block identifier8 bit tag (=22-14)14 bit slot or lineNo two blocks in the same line have the same Tag fieldCheck co

18、ntents of cache by finding line and checking Tag31Direct Mapping Cache Line TableCache lineMain Memory blocks held00, m, 2m, 3m2s-m11,m+1, 2m+12s-m+1m-1m-1, 2m-1,3m-12s-1m=10 Cache line main memory blocks held 0 0,10,20,30. 1,11,21,319 9,19,29,3932Direct Mapping Cache Organization33Direct Mapping Ex

19、ample34Direct Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2 s+ w /2w = 2sNumber of lines in cache = m = 2rSize of tag = (s r) bitsExercise:A direct cache consists of 64 lines.

20、 Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses.35Direct Mapping pros & consSimpleInexpensiveFixed location for given blockIf a program accesses 2 blocks that map to the same line repeatedly, cache misses are very highpros & cons (赞成和反对的理由,表示两个方面)miss vt.

21、& vi. 未击中; 未抓住; 未达到; 36Associative MappingA main memory block can load into any line of cacheMemory address is interpreted as tag and wordTag uniquely identifies block of memoryEvery lines tag is examined for a matchCache searching gets expensive37Fully Associative Cache Organization38Associative Ma

22、pping Example39Tag 22 bitWord2 bitAssociative MappingAddress Structure22 bit tag stored with each 32 bit block of dataCompare tag field with tag entry in cache to check for hitLeast significant 2 bits of address identify which 16 bit word is required from 32 bit data blocke.g.AddressTagDataCache lin

23、eFFFFFCFFFFFC246824683FFF40Associative Mapping SummaryAddress length = (s + w) bitsNumber of addressable units = 2s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2 s+w /2w = 2sNumber of lines in cache = undeterminedSize of tag = s bitsExerciseA associati

24、ve cache consists of 64 lines. Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses.41Set Associative MappingCache is divided into a number of setsEach set contains a number of linesA given block maps to any line in a given sete.g. Block B can be in any line of

25、set ie.g. 2 lines per set2 way associative mappingA given block can be in one of 2 lines in only one set42Set Associative MappingExample13 bit set numberBlock number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 map to same set43Two Way Set Associative Cache Organization44Set Associati

26、ve MappingAddress StructureUse set field to determine cache set to look inCompare tag field to see if we have a hite.gAddressTagDataSet number1FF 7FFC1FF123456781FFF001 7FFC001112233441FFFTag 9 bitSet 13 bitWord2 bit45Two Way Set Associative Mapping Example46Set Associative Mapping SummaryAddress le

27、ngth = (s + w) bitsNumber of addressable units = 2 s+w words or bytesBlock size = line size = 2w words or bytesNumber of blocks in main memory = 2dNumber of lines in set = kNumber of sets = v = 2dNumber of lines in cache = kv = k * 2dSize of tag = (s d) bitsA set associative cache consists of 64 lin

28、es, or slots, divided into two-line sets. Main memory contains 4K blocks of 128 words each. Show the format of main memory addresses.47Replacement Algorithms (1)Direct mappingNo choiceEach block only maps to one lineReplace that line48Replacement Algorithms (2)Associative & Set AssociativeHardware i

29、mplemented algorithm (speed)Least Recently used (LRU)e.g. in 2 way set associativeWhich of the 2 block is lru?First in first out (FIFO)replace block that has been in cache longestLeast frequently usedreplace block which has had fewest hitsRandom49Write PolicyMust not overwrite a cache block unless m

30、ain memory is up to date (最新式的)Multiple CPUs may have individual cachesI/O may address main memory directly50Write throughAll writes go to main memory as well as cacheMultiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to dateLots of trafficSlows down writesRemember bogus w

31、rite through caches!51Write backUpdates initially made in cache onlyUpdate bit for cache slot is set when update occursIf block is to be replaced, write to main memory only if update bit is setOther caches get out of syncI/O must access main memory through cacheN.B. 15% of memory references are writ

32、es52Pentium 4 Cache80386 no on chip cache80486 8k using 16 byte lines and four way set associative organizationPentium (all versions) two on chip L1 cachesData & instructionsPentium III L3 cache added off chipPentium 4L1 caches8k bytes64 byte linesfour way set associativeL2 cache Feeding both L1 cac

33、hes256k128 byte lines8 way set associativeL3 cache on chip53Intel Cache EvolutionProblemSolutionProcessor on which feature first appearsExternal memory slower than the system bus.Add external cache using faster memory technology.386Increased processor speed results in external bus becoming a bottlen

34、eck for cache access.Move external cache on-chip, operating at the same speed as the processor.486Internal cache is rather small, due to limited space on chipAdd external L2 cache using faster technology than main memory486Contention occurs when both the Instruction Prefetcher and the Execution Unit

35、 simultaneously require access to the cache. In that case, the Prefetcher is stalled while the Execution Units data access takes place.Create separate data and instruction caches.PentiumIncreased processor speed results in external bus becoming a bottleneck for L2 cache access.Create separate back-s

36、ide bus that runs at higher speed than the main (front-side) external bus. The BSB is dedicated to the L2 cache.Pentium ProMove L2 cache on to the processor chip.Pentium IISome applications deal with massive databases and must have rapid access to large amounts of data. The on-chip caches are too sm

37、all.Add external L3 cache.Pentium IIIMove L3 cache on-chip.Pentium 454Pentium 4 Block Diagram55Pentium 4 Core ProcessorFetch/Decode UnitFetches instructions from L2 cacheDecode into micro-opsStore micro-ops in L1 cacheOut of order execution logicSchedules micro-opsBased on data dependence and resour

38、cesMay speculatively executeExecution unitsExecute micro-opsData from L1 cacheResults in registersMemory subsystemL2 cache and systems bus56Pentium 4 Design ReasoningDecodes instructions into RISC like micro-ops before L1 cacheMicro-ops fixed lengthSuperscalar pipelining and schedulingPentium instru

39、ctions long & complexPerformance improved by separating decoding from scheduling & pipelining(More later ch14)Data cache is write backCan be configured to write throughL1 cache controlled by 2 bits in registerCD = cache disableNW = not write through2 instructions to invalidate (flush) cache and writ

40、e back then invalidateL2 and L3 8-way set-associative Line size 128 bytes57PowerPC Cache Organization601 single 32kb 8 way set associative603 16kb (2 x 8kb) two way set associative604 32kb620 64kbG3 & G464kb L1 cache8 way set associative256k, 512k or 1M L2 cachetwo way set associativeG532kB instruction cache64kB data cache58PowerPC G5 Block Diagram59Internet SourcesManufacturer

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论