




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
CSCE930AdvancedComputer
Architecture
Introductions
Adoptedfrom
ProfessorDavidPatterson
&
DavidCuller
ElectricalEngineeringandComputerSciences
UniversityofCalifornia,Berkeley
Outline
•ComputerScienceataCrossroads:Parallelism
-Architecture:multi-coreandmany-cores
-Program:multi-threading
•ParallelArchitecture
-WhatisParallelArchitecture?
-WhyParallelArchitecture?
-EvolutionandConvergenceofParallelArchitectures
-FundamentalDesignIssues
•ParallelPrograms
-Whybotherwithprograms?
-Importantforwhom?
•Memory&StorageSubsystemArchitectures
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction2
Crossroads:ConventionalWisdominComp.Arch
•OldConventionalWisdom:Powerisfree,Transistorsexpensive
•NewConventionalWisdom:"Powerwall“Powerexpensive,Xtorsfree
(Canputmoreonchipthancanaffordtoturnon)
•OldCW:SufficientlyincreasingInstructionLevelParallelismvia
compilers,innovation(Out-of-order,speculation,VLIW,...)
•NewCW:"ILPwall”lawofdiminishingreturnsonmoreHWforILP
•OldCW:Multipliesareslow,Memoryaccessisfast
•NewCW:"Memorywall”Memoryslow,multipliesfast
(200clockcyclestoDRAMmemory,4clocksformultiply)
•OldCW:Uniprocessorperformance2X/1.5yrs
•NewCW:PowerWall+ILPWall+MemoryWall=BrickWall
-Uniprocessorperformancenow2XI5(?)yrs
=>Seachangeinchipdesign:multiple“cores”
(2Xprocessorsperchip/〜2years)
»Moresimplerprocessorsaremorepowerefficient
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction3
Crossroads:UniprocessorPerformance
10000
0(1000
Z8
/=
x,
v
>
w
>
8100
U
E
E
JO」
do」
10
1
197819801982198419861988199019921994199619982000200220042006
•VAX:25%/year1978to1986
•RISC+x86:52%/year1986to2002
•RISC+x86:??%/year2002topresent
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction4
SeaChangeinChipDesign
•Intel4004(1971):4-bitprocessor,
2312transistors,0.4MHz,
10micronPMOS,11mm2chip
・RISCII(1983):32-bit,5stage
pipeline,40,760transistors,3MHz,
3micronNMOS,60mm2chip
•125mm2chip,0.065micronCMOS
=2312RISCll+FPU+lcache+Dcache
-RISCIIshrinksto〜0.02mm2at65nm
-CachesviaDRAMor1transistorSRAM
-ProximityCommunicationviacapacitivecouplingat>1TB/s?
(IvanSutherland@SunIBerkeley)
•Processoristhenewtransistor?
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction5
Dejavualloveragain?
•Multiprocessorsimminentin1970s,’80s,'90s,…
•“・・・today'sprocessors...arenearinganimpasseas
technologiesapproachthespeedoflight..”
DavidMitchell,TheTransputer:TheTimeIsNow(1989)
•Transputerwaspremature
nCustommultiprocessorsstrovetoleaduniprocessors
nProcrastinationrewarded:2Xseq.perf./1.5years
•"Wearededicatingallofourfutureproductdevelopmentto
multicoredesignsThisisaseachangeincomputing”
PaulOtellini,President,Intel(2004)
•Differenceisallmicroprocessorcompaniesswitchto
multiprocessors(AMD,Intel,IBM,Sun;allnewApples2CPUs)
nProcrastinationpenalized:2Xsequentialperf.I5yrs
nBiggestprogrammingchallenge:1to2CPUs
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction6
ProblemswithSeaChange
Algorithms,ProgrammingLanguages,Compilers,
OperatingSystems,Architectures,Libraries,...not
readytosupplyThreadLevelParallelismorData
LevelParallelismfor1000CPUs/chip,
Architecturesnotreadyfor1000CPUs/chip
UnlikeInstructionLevelParallelism,cannotbesolvedbyjustby
computerarchitectsandcompilerwritersalone,butalsocannot
besolvedwithoutparticipationofcomputerarchitects
ThiscourseexploresISL(InstructionLevel
Parallelism)anditsshifttoThreadLevelParallelism
IDataLevelParallelism
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction7
Outline
•ComputerScienceataCrossroads:Parallelism
-Architecture:multi-coreandmany-cores
-Program:multi-threading
•ParallelArchitecture
-WhatisParallelArchitecture?
-WhyParallelArchitecture?
-EvolutionandConvergenceofParallelArchitectures
-FundamentalDesignIssues
•ParallelPrograms
-Whybotherwithprograms?
-Importantforwhom?
•Memory&StorageSubsystemArchitectures
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction8
WhatisParallelArchitecture?
•Aparallelcomputerisacollectionofprocessing
elementsthatcooperatetosolvelargeproblems
fast
•Somebroadissues:
-ResourceAllocation:
»howlargeacollection?
»howpowerfularetheelements?
»howmuchmemory?
-Dataaccess,CommunicationandSynchronization
»howdotheelementscooperateandcommunicate?
»howaredatatransmittedbetweenprocessors?
»whataretheabstractionsandprimitivesforcooperation?
-PerformanceandScalability
»howdoesitalltranslateintoperformance?
»howdoesitscale?
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction9
WhyStudyParallelArchitecture?
Roleofacomputerarchitect:
Todesignandengineerthevariouslevelsofacomputer
systemtomaximizeperformanceandprogrammability
withinlimitsoftechnologyandcost
Parallelism:
•Providesalternativetofasterclockforperformance
•Appliesatalllevelsofsystemdesign
•Isafascinatingperspectivefromwhichtoview
architecture
•Isincreasinglycentralininformationprocessing
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction10
WhyStudyitToday?
•History:diverseandinnovativeorganizational
structures,oftentiedtonovelprogrammingmodels
•Rapidlymaturingunderstrongtechnological
constraints
-The“killermicro“isubiquitous
-Laptopsandsupercomputersarefundamentallysimilar!
-Technologicaltrendscausediverseapproachestoconverge
•Technologicaltrendsmakeparallelcomputing
inevitable
-Inthemainstreamwiththerealityofmulti-coresandmany-cores
•Needtounderstandfundamentalprinciplesand
designtradeoffs,notjusttaxonomies
-Naming,Ordering,Replication,Communicationperformance
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction11
InevitabilityofParallelComputing
•Applicationdemands:Ourinsatiableneedforcomputing
cycles
-Scientificcomputing:VRsimulationsinBiology,Chemistry,Physics,...
-General-purposecomputing:Video,Graphics,CAD,Databases,AR,VI,
TP...
•TechnologyTrends
-Numberofcoresonchipgrowingrapidly(NewMoorsLaw)
-Clockratesexpectedtogouponlyslowly(tech,wall)
•ArchitectureTrends
-Instruction-levelparallelismvaluablebutlimited
-Coarser-levelparallelism,orthread-levelparallelism,themostviable
approach
•Economics
•Currenttrends:
一Today'smicroprocessorsaremultiprocessors
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction12
ApplicationTrends
•Demandforcyclesfuelsadvancesinhardware,andvice-
versa
-Cycledrivesexponentialincreaseinmicroprocessorperformance
-Drivesparallelarchitectureharder:mostdemandingapplications
•Rangeofperformancedemands
-Needrangeofsystemperformancewithprogressivelyincreasingcost
-Platformpyramid
•Goalofapplicationsinusingparallelmachines:Speedup
•Speedup(pprocessors)=&mrmance(pprocessors)一
Performance(1processor)
•Forafixedproblemsize(inputdataset),performance=
1/time
c.z.Time(1processor)
Speedupfixedproblem(pprocessors)=——-
11/5/2011CSCE930-AdvancedComputerArchitecture,IntroductiLff76(PProces^rs)
ScientificComputingDemand
GrandChallengeproblems
Globalchange
Humangenome
Fluidturbulence
LVehicledynamics
TBOceancirculation
Viscousfluiddynamics
Superconductormodeling
100GB一Quantumchromodynamics
Vision
①
E
①
--10GB-Structural
bn.biology
①Vehicle
」
①signaturePharmaceuticaldesign
1GB-
72'hour
weather
100MB-48-hour3Dplasma
weathermodelingChemicaldynamics
2DOilreservoir
10MB-
airfoilmodeling
Illi
WOMFLOPS1GFLOP510GFLOPS100GFLOPS1TFLOPS
Computationalperformancerequirement
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction14
EngineeringComputingDemand
•Largeparallelmachinesamainstayinmanyindustries
-Petroleum(reservoiranalysis)
-Automotive(crashsimulation,draganalysis,combustionefficiency),
-Aeronautics(airflowanalysis,engineefficiency,structuralmechanics,
electromagnetism),
-Computer-aideddesign
-Pharmaceuticals(molecularmodeling)
-Visualization
»Inalloftheabove
»Entertainment(3DfilmslikeAvatar&3Dgames)
»Architecture(walk-throughsandrendering)
»VirtualReality/lmmersion(museums,teleporting,etc)
-Financialmodeling(yieldandderivativeanalysis)
-Etc.
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction15
LearningCurveforParallelApplications
Numberofprocessors
•AMBERmoleculardynamicssimulationprogram
•StartingpointwasvectorcodeforCray-1
•145MFLOPonCray90,406forfinalversionon128-processor
Paragon,891on128-processorCrayT3D
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction16
CommercialComputing
•Alsoreliesonparallelismforhighend
-Scalenotsolarge,butusemuchmorewide-spread
-Computationalpowerdeterminesscaleofbusinessthatcanbehandled
•Databases,online-transactionprocessing,decision
support,datamining,datawarehousing...
•TPCbenchmarks(TPC-Corderentry,TPC-Ddecision
support)
-Explicitscalingcriteriaprovided
-Sizeofenterprisescaleswithsizeofsystem
-Problemsizenolongerfixedaspincreases,sothroughputisusedasa
performancemeasure(transactionsperminuteortpm)
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction17
SimilarStoryforStorage
•Divergencebetweenmemorycapacityandspeedmore
pronounced
-Capacityincreasedby1000xfrom1980-95,speedonly2x
-GigabitDRAMbyc.2000,butgapwithprocessorspeedmuchgreater
•Largermemoriesareslower,whileprocessorsget
faster
-Needtotransfermoredatainparallel
-Needdeepercachehierarchies
-Howtoorganizecaches?
•Parallelismincreaseseffectivesizeofeachlevelof
hierarchy,withoutincreasingaccesstime
•Parallelismandlocalitywithinmemorysystemstoo
-Newdesignsfetchmanybitswithinmemorychip;followwithfast
pipelinedtransferacrossnarrowerinterface
-Buffercachesmostrecentlyaccesseddata
•Diskstoo:Paralleldiskspluscaching
11/5/2011CSCE930-AdvancedComputerArchitecture,IrtTroduction18
Real-worldapplicationsdemandhigh-
performingandreliablestorage
HighperformanceComputingMedicinalImage
VirtualReality.lizailonnndImpingResenrchCentre
100TB100TBUniversityofHongKon«
Digitalbody
ITB/body
1PB5GB/day
11/5/2011
〔TheWorld|
PACHIC1//1V/M
OCEAN.一次ocri.v
诏可WA、
SOUTH
L.MLKICJ
\TLANTJC
PACIFIC2,060Mil
OCE4IV
GIS>1PBOceanresourcedat>1PB
Google,Yahoo,...
>lPB/year
Oilprospecting1PB
1PB=1000TB=1015Bytes,
Itisequaltothecapacityof10,000100GBdisks.
TechnoogyTrends:MoorecnLaw:2Xfrans一sfors/
飞
=yea飞JMXcores7nnyea
T
N
E
H
O
P
M
O
C
T/
S
O
C
G
IH
R
U
T
F^
U
N
A
M
E
IV
T
A
L6
E
R5
N4
S
TIO3
NT
EC2
NN—
EOU
HPF
TM0
OD3
FCE
OT8
2FA
GOR7
OG
LRE6
ET5
8IN
M4
UR
NE3
P2
-
•■・
IL,-■••-•,»・■卜
O
90123456789012345
56666666666777777
99999999999999999
II11111111111111
1
m>R
•^crammingMoreComponenfsonf。Wegrafedc-rcuifs:
IGordonMooreyErocfronicsy1965
•#onfrans一sfors/cosreffecHveinfegrafedcircuifdoubleeveryNmonfhs(12IANIA24)
11/5/2011CSCE930,AdvancedComputerArchifecture,ntroduc±on21
TrackingTechnologyPerformanceTrends
•Drilldowninto4technologies:
-Disks,
-Memory,
-Network,
-Processors
•Compare*1980Archaic(Nostalgic)vs.
*2000Modern(Newfangled)
-PerformanceMilestonesineachtechnology
•CompareforBandwidthvs.Latencyimprovements
inperformanceovertime
•Bandwidth:numberofeventsperunittime
-E.g.,Mbits/secondovernetwork,Mbytes/secondfromdisk
•Latency:elapsedtimeforasingleevent
-E.g.,one-waynetworkdelayinmicroseconds,
averagediskaccesstimeinmilliseconds
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction22
Disks:Archaic(Nostalgic)v.Modern(Newfangled)
CDCWrenI,1983Seagate373453,2003
3600RPM15000RPM(4X)
0.03GBytescapacity73.4GBytes(2500X)
Tracks/lnch:800Tracks/lnch:64000(80X)
Bits/lnch:9550Bits/lnch:533,000(60X)
Three5.25”plattersFour2.5,5platters
(in3.5”formfactor)
Bandwidth:Bandwidth:
0.6MBytes/sec86MBytes/sec(140X)
Latency:48.3msLatency:5.7ms(8X)
Cache:noneCache:8MBytes
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction23
LatencyLagsBandwidth(forlast〜20years)
PerformanceMilestones
Disk:3600,5400,7200,10000,
15000RPM(8x,143x)
(latency=simpleoperationw/ocontention
RelativeLatencyImprovementBW=best-case)
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction24
Memory:Archaic(Nostalgic)v.Modern(Newfangled)
•1980DRAM•2000DoubleDataRateSynchr.
(asynchronous)(clocked)DRAM
■0.06Mbits/chip•256.00Mbits/chip(4000X)
•64,000xtors,35mm2•256,000,000xtors,204mm2
•16-bitdatabusper•64-bitdatabusper
module,16pins/chipDIMM,66pins/chip(4X)
•13Mbytes/sec•1600Mbytes/sec(120X)
•Latency:225ns•Latency:52ns(4X)
•(noblocktransfer)•Blocktransfers(pagemode)
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction25
LatencyLagsBandwidth(last〜20years)
PerformanceMilestones
MemoryModule:16bitplain
DRAM,PageModeDRAM,32b,
64b,SDRAM,
DDRSDRAM(4x」20x)
Disk:3600,5400,7200,10000,
15000RPM(8x,143x)
(latency=simpleoperationw/ocontention
BW=best-case)
RelativeLatencyImprovement
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction26
LANs:Archaic(Nostalgic)v.Modern(Newfangled)
•Ethernet802.3•Ethernet802.3ae
•YearofStandard:1978•YearofStandard:2003
•10Mbits/s•10,000Mbits/s(1000X)
linkspeedlinkspeed
•Latency:3000|Lisec•Latency:190|Lisec(15X)
•Sharedmedia•Switchedmedia
•Coaxialcable•Category5copperwire
"Cat5"is4twistedpairsinbundle
CoaxialCable:/PlasticCoveringTwistedPair:
__________<,Braidedouterconductor
InsulatorXXZX二X
\J—CoppercoreCopper,1mmthick,
twistedtoavoidantennaeffect
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction27
LatencyLagsBandwidth(last〜20years)
•PerformanceMilestones
•Ethernet:10Mb,100Mb,
1000Mb,10000Mb/s(i6x,iooox)
•MemoryModule:16bitplain
DRAM,PageModeDRAM,32b,
64b,SDRAM,
DDRSDRAM(4x」20x)
•Disk:3600,5400,7200,10000,
15000RPM(8x,143x)
(latency=simpleoperationw/ocontention
RelativeLatencyImprovementBW=best-case)
11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction28
CPUs:Archaic(Nostalgic)v.Modern(Newfangled)
1982Intel802862001IntelPentium4
12.5MHz
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 十几就是十和几教学设计-2024-2025学年一年级上册数学沪教版
- 金冶炼行业分析报告
- 商铺采购合同范本
- 2025年绿色建筑设计与承揽合同
- 2025年度编剧培训课程聘用合同标准
- 2025年度吧台舞台承包合同酒吧运营升级版-@-2
- 2025年度沉井施工特种作业人员安全培训协议
- 盐酸加替沙星项目可行性研究报告申请报告
- 8冀中的地道战 教学设计-2024-2025学年语文五年级上册统编版
- 2025年度二手房买卖合同变更协议书
- 七年级数学下册 第11章 单元测试卷(苏科版 2025年春)
- 《恒瑞医药股权激励实施方案探析综述》6200字
- 傅佩荣论语三百讲(1-300讲)汇编
- 统编版一年级下册语文全册完整课件
- 《植树问题(两端都栽)》教学实录-2024-2025学年人教版五年级数学上册
- 部编人教版语文小学六年级下册第四单元主讲教材解读(集体备课)
- (2024年)师德师风学习内容教师师德师风培训内容通用多篇
- T_CEC 102.1-2016 电动汽车充换电服务信息交换 第1部分_总则_(高清-最新版)
- 国际形式发票模板
- 山西省会计师事务所服务收费标准(汇编)
- 陕西延长石油(集团)有限责任公司企业年金方案
评论
0/150
提交评论