先进的计算机体系结构_第1页
先进的计算机体系结构_第2页
先进的计算机体系结构_第3页
先进的计算机体系结构_第4页
先进的计算机体系结构_第5页
已阅读5页,还剩126页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

CSCE930AdvancedComputer

Architecture

Introductions

Adoptedfrom

ProfessorDavidPatterson

&

DavidCuller

ElectricalEngineeringandComputerSciences

UniversityofCalifornia,Berkeley

Outline

•ComputerScienceataCrossroads:Parallelism

-Architecture:multi-coreandmany-cores

-Program:multi-threading

•ParallelArchitecture

-WhatisParallelArchitecture?

-WhyParallelArchitecture?

-EvolutionandConvergenceofParallelArchitectures

-FundamentalDesignIssues

•ParallelPrograms

-Whybotherwithprograms?

-Importantforwhom?

•Memory&StorageSubsystemArchitectures

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction2

Crossroads:ConventionalWisdominComp.Arch

•OldConventionalWisdom:Powerisfree,Transistorsexpensive

•NewConventionalWisdom:"Powerwall“Powerexpensive,Xtorsfree

(Canputmoreonchipthancanaffordtoturnon)

•OldCW:SufficientlyincreasingInstructionLevelParallelismvia

compilers,innovation(Out-of-order,speculation,VLIW,...)

•NewCW:"ILPwall”lawofdiminishingreturnsonmoreHWforILP

•OldCW:Multipliesareslow,Memoryaccessisfast

•NewCW:"Memorywall”Memoryslow,multipliesfast

(200clockcyclestoDRAMmemory,4clocksformultiply)

•OldCW:Uniprocessorperformance2X/1.5yrs

•NewCW:PowerWall+ILPWall+MemoryWall=BrickWall

-Uniprocessorperformancenow2XI5(?)yrs

=>Seachangeinchipdesign:multiple“cores”

(2Xprocessorsperchip/〜2years)

»Moresimplerprocessorsaremorepowerefficient

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction3

Crossroads:UniprocessorPerformance

10000

0(1000

Z8

/=

x,

v

>

w

>

8100

U

E

E

JO」

do」

10

1

197819801982198419861988199019921994199619982000200220042006

•VAX:25%/year1978to1986

•RISC+x86:52%/year1986to2002

•RISC+x86:??%/year2002topresent

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction4

SeaChangeinChipDesign

•Intel4004(1971):4-bitprocessor,

2312transistors,0.4MHz,

10micronPMOS,11mm2chip

・RISCII(1983):32-bit,5stage

pipeline,40,760transistors,3MHz,

3micronNMOS,60mm2chip

•125mm2chip,0.065micronCMOS

=2312RISCll+FPU+lcache+Dcache

-RISCIIshrinksto〜0.02mm2at65nm

-CachesviaDRAMor1transistorSRAM

-ProximityCommunicationviacapacitivecouplingat>1TB/s?

(IvanSutherland@SunIBerkeley)

•Processoristhenewtransistor?

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction5

Dejavualloveragain?

•Multiprocessorsimminentin1970s,’80s,'90s,…

•“・・・today'sprocessors...arenearinganimpasseas

technologiesapproachthespeedoflight..”

DavidMitchell,TheTransputer:TheTimeIsNow(1989)

•Transputerwaspremature

nCustommultiprocessorsstrovetoleaduniprocessors

nProcrastinationrewarded:2Xseq.perf./1.5years

•"Wearededicatingallofourfutureproductdevelopmentto

multicoredesignsThisisaseachangeincomputing”

PaulOtellini,President,Intel(2004)

•Differenceisallmicroprocessorcompaniesswitchto

multiprocessors(AMD,Intel,IBM,Sun;allnewApples2CPUs)

nProcrastinationpenalized:2Xsequentialperf.I5yrs

nBiggestprogrammingchallenge:1to2CPUs

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction6

ProblemswithSeaChange

Algorithms,ProgrammingLanguages,Compilers,

OperatingSystems,Architectures,Libraries,...not

readytosupplyThreadLevelParallelismorData

LevelParallelismfor1000CPUs/chip,

Architecturesnotreadyfor1000CPUs/chip

UnlikeInstructionLevelParallelism,cannotbesolvedbyjustby

computerarchitectsandcompilerwritersalone,butalsocannot

besolvedwithoutparticipationofcomputerarchitects

ThiscourseexploresISL(InstructionLevel

Parallelism)anditsshifttoThreadLevelParallelism

IDataLevelParallelism

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction7

Outline

•ComputerScienceataCrossroads:Parallelism

-Architecture:multi-coreandmany-cores

-Program:multi-threading

•ParallelArchitecture

-WhatisParallelArchitecture?

-WhyParallelArchitecture?

-EvolutionandConvergenceofParallelArchitectures

-FundamentalDesignIssues

•ParallelPrograms

-Whybotherwithprograms?

-Importantforwhom?

•Memory&StorageSubsystemArchitectures

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction8

WhatisParallelArchitecture?

•Aparallelcomputerisacollectionofprocessing

elementsthatcooperatetosolvelargeproblems

fast

•Somebroadissues:

-ResourceAllocation:

»howlargeacollection?

»howpowerfularetheelements?

»howmuchmemory?

-Dataaccess,CommunicationandSynchronization

»howdotheelementscooperateandcommunicate?

»howaredatatransmittedbetweenprocessors?

»whataretheabstractionsandprimitivesforcooperation?

-PerformanceandScalability

»howdoesitalltranslateintoperformance?

»howdoesitscale?

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction9

WhyStudyParallelArchitecture?

Roleofacomputerarchitect:

Todesignandengineerthevariouslevelsofacomputer

systemtomaximizeperformanceandprogrammability

withinlimitsoftechnologyandcost

Parallelism:

•Providesalternativetofasterclockforperformance

•Appliesatalllevelsofsystemdesign

•Isafascinatingperspectivefromwhichtoview

architecture

•Isincreasinglycentralininformationprocessing

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction10

WhyStudyitToday?

•History:diverseandinnovativeorganizational

structures,oftentiedtonovelprogrammingmodels

•Rapidlymaturingunderstrongtechnological

constraints

-The“killermicro“isubiquitous

-Laptopsandsupercomputersarefundamentallysimilar!

-Technologicaltrendscausediverseapproachestoconverge

•Technologicaltrendsmakeparallelcomputing

inevitable

-Inthemainstreamwiththerealityofmulti-coresandmany-cores

•Needtounderstandfundamentalprinciplesand

designtradeoffs,notjusttaxonomies

-Naming,Ordering,Replication,Communicationperformance

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction11

InevitabilityofParallelComputing

•Applicationdemands:Ourinsatiableneedforcomputing

cycles

-Scientificcomputing:VRsimulationsinBiology,Chemistry,Physics,...

-General-purposecomputing:Video,Graphics,CAD,Databases,AR,VI,

TP...

•TechnologyTrends

-Numberofcoresonchipgrowingrapidly(NewMoorsLaw)

-Clockratesexpectedtogouponlyslowly(tech,wall)

•ArchitectureTrends

-Instruction-levelparallelismvaluablebutlimited

-Coarser-levelparallelism,orthread-levelparallelism,themostviable

approach

•Economics

•Currenttrends:

一Today'smicroprocessorsaremultiprocessors

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction12

ApplicationTrends

•Demandforcyclesfuelsadvancesinhardware,andvice-

versa

-Cycledrivesexponentialincreaseinmicroprocessorperformance

-Drivesparallelarchitectureharder:mostdemandingapplications

•Rangeofperformancedemands

-Needrangeofsystemperformancewithprogressivelyincreasingcost

-Platformpyramid

•Goalofapplicationsinusingparallelmachines:Speedup

•Speedup(pprocessors)=&mrmance(pprocessors)一

Performance(1processor)

•Forafixedproblemsize(inputdataset),performance=

1/time

c.z.Time(1processor)

Speedupfixedproblem(pprocessors)=——-

11/5/2011CSCE930-AdvancedComputerArchitecture,IntroductiLff76(PProces^rs)

ScientificComputingDemand

GrandChallengeproblems

Globalchange

Humangenome

Fluidturbulence

LVehicledynamics

TBOceancirculation

Viscousfluiddynamics

Superconductormodeling

100GB一Quantumchromodynamics

Vision

E

--10GB-Structural

bn.biology

①Vehicle

①signaturePharmaceuticaldesign

1GB-

72'hour

weather

100MB-48-hour3Dplasma

weathermodelingChemicaldynamics

2DOilreservoir

10MB-

airfoilmodeling

Illi

WOMFLOPS1GFLOP510GFLOPS100GFLOPS1TFLOPS

Computationalperformancerequirement

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction14

EngineeringComputingDemand

•Largeparallelmachinesamainstayinmanyindustries

-Petroleum(reservoiranalysis)

-Automotive(crashsimulation,draganalysis,combustionefficiency),

-Aeronautics(airflowanalysis,engineefficiency,structuralmechanics,

electromagnetism),

-Computer-aideddesign

-Pharmaceuticals(molecularmodeling)

-Visualization

»Inalloftheabove

»Entertainment(3DfilmslikeAvatar&3Dgames)

»Architecture(walk-throughsandrendering)

»VirtualReality/lmmersion(museums,teleporting,etc)

-Financialmodeling(yieldandderivativeanalysis)

-Etc.

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction15

LearningCurveforParallelApplications

Numberofprocessors

•AMBERmoleculardynamicssimulationprogram

•StartingpointwasvectorcodeforCray-1

•145MFLOPonCray90,406forfinalversionon128-processor

Paragon,891on128-processorCrayT3D

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction16

CommercialComputing

•Alsoreliesonparallelismforhighend

-Scalenotsolarge,butusemuchmorewide-spread

-Computationalpowerdeterminesscaleofbusinessthatcanbehandled

•Databases,online-transactionprocessing,decision

support,datamining,datawarehousing...

•TPCbenchmarks(TPC-Corderentry,TPC-Ddecision

support)

-Explicitscalingcriteriaprovided

-Sizeofenterprisescaleswithsizeofsystem

-Problemsizenolongerfixedaspincreases,sothroughputisusedasa

performancemeasure(transactionsperminuteortpm)

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction17

SimilarStoryforStorage

•Divergencebetweenmemorycapacityandspeedmore

pronounced

-Capacityincreasedby1000xfrom1980-95,speedonly2x

-GigabitDRAMbyc.2000,butgapwithprocessorspeedmuchgreater

•Largermemoriesareslower,whileprocessorsget

faster

-Needtotransfermoredatainparallel

-Needdeepercachehierarchies

-Howtoorganizecaches?

•Parallelismincreaseseffectivesizeofeachlevelof

hierarchy,withoutincreasingaccesstime

•Parallelismandlocalitywithinmemorysystemstoo

-Newdesignsfetchmanybitswithinmemorychip;followwithfast

pipelinedtransferacrossnarrowerinterface

-Buffercachesmostrecentlyaccesseddata

•Diskstoo:Paralleldiskspluscaching

11/5/2011CSCE930-AdvancedComputerArchitecture,IrtTroduction18

Real-worldapplicationsdemandhigh-

performingandreliablestorage

HighperformanceComputingMedicinalImage

VirtualReality.lizailonnndImpingResenrchCentre

100TB100TBUniversityofHongKon«

Digitalbody

ITB/body

1PB5GB/day

11/5/2011

〔TheWorld|

PACHIC1//1V/M

OCEAN.一次ocri.v

诏可WA、

SOUTH

L.MLKICJ

\TLANTJC

PACIFIC2,060Mil

OCE4IV

GIS>1PBOceanresourcedat>1PB

Google,Yahoo,...

>lPB/year

Oilprospecting1PB

1PB=1000TB=1015Bytes,

Itisequaltothecapacityof10,000100GBdisks.

TechnoogyTrends:MoorecnLaw:2Xfrans一sfors/

=yea飞JMXcores7nnyea

T

N

E

H

O

P

M

O

C

T/

S

O

C

G

IH

R

U

T

F^

U

N

A

M

E

IV

T

A

L6

E

R5

N4

S

TIO3

NT

EC2

NN—

EOU

HPF

TM0

OD3

FCE

OT8

2FA

GOR7

OG

LRE6

ET5

8IN

M4

UR

NE3

P2

-

•■・

IL,-■••-•,»・■卜

O

90123456789012345

56666666666777777

99999999999999999

II11111111111111

1

m>R

•^crammingMoreComponenfsonf。Wegrafedc-rcuifs:

IGordonMooreyErocfronicsy1965

•#onfrans一sfors/cosreffecHveinfegrafedcircuifdoubleeveryNmonfhs(12IANIA24)

11/5/2011CSCE930,AdvancedComputerArchifecture,ntroduc±on21

TrackingTechnologyPerformanceTrends

•Drilldowninto4technologies:

-Disks,

-Memory,

-Network,

-Processors

•Compare*1980Archaic(Nostalgic)vs.

*2000Modern(Newfangled)

-PerformanceMilestonesineachtechnology

•CompareforBandwidthvs.Latencyimprovements

inperformanceovertime

•Bandwidth:numberofeventsperunittime

-E.g.,Mbits/secondovernetwork,Mbytes/secondfromdisk

•Latency:elapsedtimeforasingleevent

-E.g.,one-waynetworkdelayinmicroseconds,

averagediskaccesstimeinmilliseconds

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction22

Disks:Archaic(Nostalgic)v.Modern(Newfangled)

CDCWrenI,1983Seagate373453,2003

3600RPM15000RPM(4X)

0.03GBytescapacity73.4GBytes(2500X)

Tracks/lnch:800Tracks/lnch:64000(80X)

Bits/lnch:9550Bits/lnch:533,000(60X)

Three5.25”plattersFour2.5,5platters

(in3.5”formfactor)

Bandwidth:Bandwidth:

0.6MBytes/sec86MBytes/sec(140X)

Latency:48.3msLatency:5.7ms(8X)

Cache:noneCache:8MBytes

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction23

LatencyLagsBandwidth(forlast〜20years)

PerformanceMilestones

Disk:3600,5400,7200,10000,

15000RPM(8x,143x)

(latency=simpleoperationw/ocontention

RelativeLatencyImprovementBW=best-case)

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction24

Memory:Archaic(Nostalgic)v.Modern(Newfangled)

•1980DRAM•2000DoubleDataRateSynchr.

(asynchronous)(clocked)DRAM

■0.06Mbits/chip•256.00Mbits/chip(4000X)

•64,000xtors,35mm2•256,000,000xtors,204mm2

•16-bitdatabusper•64-bitdatabusper

module,16pins/chipDIMM,66pins/chip(4X)

•13Mbytes/sec•1600Mbytes/sec(120X)

•Latency:225ns•Latency:52ns(4X)

•(noblocktransfer)•Blocktransfers(pagemode)

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction25

LatencyLagsBandwidth(last〜20years)

PerformanceMilestones

MemoryModule:16bitplain

DRAM,PageModeDRAM,32b,

64b,SDRAM,

DDRSDRAM(4x」20x)

Disk:3600,5400,7200,10000,

15000RPM(8x,143x)

(latency=simpleoperationw/ocontention

BW=best-case)

RelativeLatencyImprovement

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction26

LANs:Archaic(Nostalgic)v.Modern(Newfangled)

•Ethernet802.3•Ethernet802.3ae

•YearofStandard:1978•YearofStandard:2003

•10Mbits/s•10,000Mbits/s(1000X)

linkspeedlinkspeed

•Latency:3000|Lisec•Latency:190|Lisec(15X)

•Sharedmedia•Switchedmedia

•Coaxialcable•Category5copperwire

"Cat5"is4twistedpairsinbundle

CoaxialCable:/PlasticCoveringTwistedPair:

__________<,Braidedouterconductor

InsulatorXXZX二X

\J—CoppercoreCopper,1mmthick,

twistedtoavoidantennaeffect

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction27

LatencyLagsBandwidth(last〜20years)

•PerformanceMilestones

•Ethernet:10Mb,100Mb,

1000Mb,10000Mb/s(i6x,iooox)

•MemoryModule:16bitplain

DRAM,PageModeDRAM,32b,

64b,SDRAM,

DDRSDRAM(4x」20x)

•Disk:3600,5400,7200,10000,

15000RPM(8x,143x)

(latency=simpleoperationw/ocontention

RelativeLatencyImprovementBW=best-case)

11/5/2011CSCE930-AdvancedComputerArchitecture,Introduction28

CPUs:Archaic(Nostalgic)v.Modern(Newfangled)

1982Intel802862001IntelPentium4

12.5MHz

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论