




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ComputerOrganization
&Design—The
Hardware/SoftwareInterface2021/9/51ReferenceComputer
Organization
&
Design国内称为:计算机组成原理国外也称:computer
system,computerprinciple我们用4th
Edition可以参考本书英文版第三版第二、三、四版的中文版本计算机组成和设计硬件/软件接口第2版:出版社:清华大学出版社
第3、4版:出版社:机械工业出版社传统计算机组成原理教材作者为:白中英,王爱英,唐朔飞等2021/9/52Evaluation
and
Grades2021/9/53Class
Participation
10%Labs
20%Homework
Assignments
10%Project
20%Final
Examinations
40%章节学时内容备注第一章概论6章节:1.1~1.5计算机历史软硬件组成性能评价CPI、MIPS、FLOPSRISC、CISC第二章指令是硬件机器的语言——计算机指令系统8章节:2.1~2.14指令系统汇编反汇编算术、逻辑指令转移指令、子程序寻址方式C语言编译汇编指令转化成机器码(汇编),以及机器码转化成汇编(反汇编),考研补充:指令格式、种类第三章计算机中数的表示、转换与运算8章节:3.1~3.6数据表示整数加减运算整数乘除运算浮点表示,加减运算整数加减算法分析、优化加法器设计乘除算法分析浮点数第四章处理器——数据通路与控制器的设计14章节:4.1~4.4单个组件设计MIPS指令系统
ALU与ALU控制器单时钟数据通道多时钟数据通道控制器设计新版书只讲到单时钟,然后就流水线啦。流水还是留体系课。同时也为了配合实验,补充多时钟与控制器(FSM)内容。考研补充:总线型CPU设计,微指令控制第五章存贮体系结构8章节:5.1~5.5主要内容:存储器概论,位扩展字扩展Cache虚拟存储考研补充:存储器构成,位扩展、字扩展第六章接口处理器和外部设备8章节:6.1~6.6I/O概论磁盘系统总线系统、仲裁数据通讯:轮询、中断、DMA以概念为主0复21习/9/5224CourseObjectives2021/9/55Understand
modern
computers,
their
evolution,
andtrade-offs
at
the
HW/SW
interfaceInstruction
Set
ArchitectureComputer
ArithmeticPerformance
and
MetricsPipeliningUnderstand
the
design
of
a
modern
computer
systemDatapath
designControl
designMemory
System
DesignI/O
System
DesignWhat
youwillLearn2021/9/56How
are
programs
written
in
high
level
languages
(C or
Java)
translated
into
the
language
of
the
hardware, and
how
does
hardware
execute
the
resulting program?What
is
the
interface
between
software
and
hardware, and
how
does
software
instruct
hardware
to
perform needed
functions?What
determines
the
performance
of
a
program,
and how
can
software
programmers
and
hardware designers
improve
performance?The
basic
operation
of
a
computer:What
is
a
computer?What
doesit
do?How
does
it
work?2021/9/57
primitive
operations
(instructions)arithmeticinstruction
sequencing
and
processingmemoryinput/outputetc.Understand
the
relationship
between
abstractionsWhat
is
donein
hardware?
What
is
done
in
software?interface
designhigh-level
program
to
control
signals
(SW
->HW)Software
performance
depends
on
understanding
underlying
HWWhat
You
Will
LearnChapter
12021/9/58Computer
Abstractionsand
TechnologyContents
ofChapter12021/9/591.11.21.31.41.51.6IntroductionComputer
Language
and
Software
SystemComputer
Hardware
SystemPerformanceReal
Stuff:
Manufacturing
Pentium
ChipsHistory
of
Computer
DevelopmentENIACEckert
and
Mauchly1st
working
fully
electronic
computer.1946.18,000
Vacuum
tubes.1,800
instructions/sec.3,000
ft3.Electronic
Numerical
Integrator
And
Computer2021/9/510EDSACMaurice
Wilkes.1st
electronic
stored
programcomputer.650instructions/sec.1,400
ft3.EDSAC
1
(1949)2021/9/511Electronic
Delay
Storage
Automatic
CalculatorMainframeEra:1950s-
1960s12Processor(CPU)I/OEnabling
Tech:
ComputersBig
Players:
“Big
Iron”
(IBM,
UNIVAC)Cost:
$1M,
Target:
BusinessesUsing:
COBOL,
Fortran,
timesharing
OS2021/9/5I/OThe
mainframe
era
−
IBM
360
(1970's)2021/9/513Minicomputer
Era:
1970sEnabling
Tech:
Integrated
circuitsBig
Players:
Digital,
HPCost:
$10k,
Target:
Labs
&
universitiesUsing:
C,
UNIX
OS2021/9/514PC Era:
Mid
1980s -
Mid
2000sEnabling
Tech:
MicroprocessorsBig
Players:
Apple,
IBMCost:
$1k,
Target:
Consumers
(1/person)Using:
Basic,
Java,
Windows
OS2021/9/515Intel
4004Introduced
in
1970.First
microprocessor.2,250
transistors.12
mm2.108
kHz.2021/9/516Intel
808629,000
transistors.33
mm2.5
MHz.Introduced
in
1979.Basic
architecture
of
the
IA32
PC.2021/9/517Intel
804861,200,000
transistors.81
mm2.25
MHz.Introduced
in
1989.1st
pipelined
implementation
of
IA32.2021/9/518Pentium3,100,000
transistors.296
mm2.60
MHz.Introduced
in
1993.1st
superscalar
implementation
of
IA32.2021/9/519Pentium
455,000,000
transistors.146
mm2.3
GHz.Introduced
in
2000.2021/9/520Intel
Core
Duo291,000,000
transistors.143
mm2
(65nmtechnology).3
GHz.Introduced
in
2006.2021/9/521Core
1Core
2CacheUltraSparc
T2
(Niagara
2)500,000,000
transistors.342mm2–65nm.1.2–1.4
GHz.8
cores.64
threads.1
FPU
per
core.Introduced
in
2007.1
core2021/9/522Modern
computer
systems2021/9/523Post-PC
Era:
Late
2000s-
???Enabling
Tech:Wireless
networking,
smartphonesBig
Players:
Apple,
Nokia,
…Cost:
$500,
Target:
Consumers
on
the
goUsing:
Objective
C,
Android
OS2021/9/524Personal
MobileDevices
(PMD):Post-PC
Era:
Late
2000s-
???Enabling
Tech:
Local
Area
Networks,
broadband
InternetBig
Players:
Amazon,
Google,
…Target:
Transient
users
or
users
who
cannot
afford
high-endequipment2021/9/525CloudComputing:Post-PC
Era:
Late
2000s-
???Datacenters
andWarehouse
ScaleComputers
(WSC):Enabling
Tech:
Local
Area
Networks,
cheap
serversCost:
$200M
clusters
+
maintenance
costsTarget:
Internet
services
and
PMDs2021/9/526Advanced
RISC
Machine
(ARM)instruction
set
inside
the
iPhoneYou
will
learn
how
to
design
and
program
arelated
RISC
computer:MIPS2021/9/527iPhone
Innards
1
GHzARMCortex
A8I/OI/OProcessorI/O
MemoryYou
will
learn
about
multiple
processors,
data
level
parallelism,caches2021/9/528EECS
370:
Introduction
toComputer
OrganizationWhat
next?
Many-cores
and
GPUSIntel
Polaris:
80
cores
–
experimental
design.Intel
Larrabee:
16–40
cores
(first
generation
cancelled
recently).Nvidia:
programmableGPU
arrays
(hundreds)292021/9/529Classesof
Computers2021/9/530Desktop
/
Notebook
ComputersGeneral
purpose,
variety
of
softwareSubject
to
cost/performance
tradeoffServer
ComputersNetwork
basedHigh
capacity,
performance,
reliabilityRange
from
small
servers
to
building
sizedEmbedded
ComputersHidden
as
components
of
systemsStringent
power/performance/cost
constraintsTheProcessorMarketembedded
growth
>>
desktop
growth2021/9/531Where
else
are
embedded
processors
found?Whatnext? Divergent
embeddedapplications?Sensing,
communication,
multimedia,
control2021/9/532Contents
ofChapter12021/9/5331.11.21.31.41.51.6IntroductionBelow
Your
ProgramUnder
the
CoversPerformanceThe
Power
WallHistory
of
Computer
Development1.2 BelowYourProgramApplication
softwareWritten
in
high-level
languageSystem
softwareCompiler:
translates
HLL
code
tomachine
codeOperating
System:
service
codeHandling
input/outputManaging
memory
and
storageScheduling
tasks
&
sharing resourcesHardwareProcessor,
memory,
I/O
controllersLevels
ofProgramCodeHigh-level
language
program
(in
C)void
swap
(int
v[],
int
k){inttemp;temp
=v[k];v[k]
=v[k+1];v[k+1]
=
temp;}Assemblylanguageprogram(forMIPS)swap:
slladd$2,
$5,
2$2,
$4,
$2lw$15,
0($2)lw$16,
4($2)sw$16,
0($2)sw$15,
4($2)jr$310000000000000101000100001000000000000000100000100001000000100000...C
compilerone-to-manyone-to-oneassemblerMachine(object,
binary)
code
(forMIPS)Major
Components
of
aComputerProcessorControlDatapathMemoryInputOutputDevicesNetworkInputDeviceInputsObjectCodeProcessorControlDatapathMemory000000000000010100010000100000000000000010000010000100000010000010001100010011110000000000000000100011000101000000000000000001001010110001010000000000000000000010101100010011110000000000000100000000
11111
00000
0000000000001000InputOutputDevicesNetworkObjectCodeStoredinMemoryProcessorControlDatapathMemoryDevicesNetworkInputOutput00000000000
00101
000100001000000000000000100
00010
000100000010000010001100010
01111
000000000000000010001100010
10000
000000000000010010101100010
10000
000000000000000010101100010
01111
000000000000010000000011111
00000
0000000000001000Processor
Fetches
an
InstructionProcessor
fetches
an
instruction
from
memoryProcessorControlDatapathMemoryDevicesNetworkInputOutput00000000000
00101
000100001000000000000000100
00010
000100000010000010001100010
01111
000000000000000010001100010
10000
000000000000010010101100010
10000
000000000000000010101100010
01111
000000000000010000000011111
00000
0000000000001000Control
Decodes
the
InstructionControl
decodes
the
instruction
to
determine
what
toexecuteProcessorControl000000
00100
00010
0001000000100000DatapathMemoryDevicesNetworkInputOutputDatapath
Executes
the
InstructioDatapath
executes
the
instruction
as
directed
by
controlProcessorControl000000
00100
00010
0001000000100000Datapathcontents
Reg
#4
ADD
contents
Reg
#2results
putin
Reg
#2MemoryDevicesNetworkInputOutputWhat
Happens
Next?ProcessorControlDatapathMemory00000000000001010001000010000000000000001000001000010000001000001000110001001111000000000000000010001100010100000000000000000100101011000101000000000000000000001010110001001111000000000000010000000011111000000000000000001000DevicesNetworkInputOutputProcessorMemory000000
00000
00101
0001000010000000ControlDatapath000000
00100
00010
0001000000100000100011
00010
01111
0000000000000000100011
00010
10000
0000000000000100101011
00010
10000
0000000000000000101011
00010
01111
0000000000000100000000
11111
00000
0000000000001000FetchDecodeExecDevices
NetworkInputOutputWhat
Happens
Next?Processor
fetches
the
next
instruction
from
memoryHow
does
it
knowwhich
location
inmemory
to
fetch
from
next?Advantages
ofHigher-LevelLanguages
?2021/9/544Higher-level
languagesAllow
the
programmer
to
think
in
amore
natural
language
and
fortheir
intended
use
(Fortran
for
scientific
computation,
Cobol
forbusiness
programming,
Lisp
for
symbol
manipulation,
Java
for
webprogramming,
…)Improve
programmer
productivity
–more
understandable
code
thatis
easier
to
debug
andvalidateImprove
program
maintainabilityAllowprograms
to
beindependent
of
the
computer
on
which
theyare
developed
(compilers
and
assemblers
can
translate
high-levellanguage
programs
to
the
binary
instructions
of
any
machine)Emergence
of
optimizing
compilers
that
produce
very
efficientassembly
code
optimized
for
the
target
machineAs
a
result,
very
little
programming
is
done
today
at
the assembler
levelSystems
softwareaimed
at
programmersApplications
softwareaimedatusersLearn
hardware
can
program
the
Systems
softwareSystems
software
includesOperation
SystemCompilerAssembler…2021/9/545CategorizesoftwarebyitsuseAn
example
of
the
decomposability
ofcomputer
systemsApplicationssoftwarelaTEXVirtualmemoryI/O
devicedriversAssemblersasCompilersgccSystemssoftwareOperatingsystemsFilesystemSoftware2021/9/546Contents
ofChapter12021/9/5471.11.21.31.41.51.6IntroductionBelow
Your
ProgramUnder
the
CoversPerformanceThe
Power
WallHistory
of
Computer
Development2021/9/548TheSystemUnit2021/9/549What
are
common
components
insidethe
system
unit?
Processor
Memory
module
Expansion
cardsSound
cardModem
cardVideo
cardNetwork interface card
Ports
andConnectorsWhat
isthe
motherboard?2021/9/5502021/9/5512021/9/552InsidetheProcessorAMD
Barcelona:
4
processor
cores2021/9/553AMD’s
Barcelona
Multicore
ChipFour
out-of-order
cores
on
one
chip1.9
GHz
clock
rate65nm
technologyThree
levels
of
caches
(L1,
L2,
L3)
on
chipIntegrated
Northbridge2021/9/554The
five
classic
components
of
acomputer2021/9/555FiveClassicComponentsSince
the
1940’s,
computers
have
5
classic…componentsInput
devicesKeyboard,mouse,Output
devicesDisplay,
printer,
…Storage
devicesVolatilememory
devices:
DRAM,
SRAM,
…Permanent
storage
devices:
Magnetic,
Optical,
andFlash
disks,
…DatapathControlNewly
added
6th
component:
NetworkTogether,
they
are
called
the
ProcessorProcessorComputerControlDatapathMemoryDevicesInput2021/9/556OutputHardwareSystemssoftwareApplicationssoftwareA
simplified
view
of
hardware
and
software
ashierarchical
layers2021/9/557Machine
StructuresI/O
systemProcessorCompilerOperatingSystemApplication
(ex:
browser)Instruction
SetArchitectureDatapath
&
ControlDigital
DesignCircuit
DesignTransistorsMemoryHardwareSoftwareAssembler2021/9/558Levels
ofRepresentation/InterpretationHigher-Level
LanguageProgram
(e.g.
C)Assembly
LanguageProgram
(e.g.
MIPS)Compiler2021/9/559temp
=
v[k];v[k]
=
v[k+1];v[k+1]
=
temp;lw
$t0,
0($2)lw
$t1,
4($2)sw
$t1,
0($2)sw
$t0,
4($2)0000
1001
1100
0110
1010
1111
0101
10001010
1111
0101
1000
0000
1001
1100
01101100
0110
1010
1111
0101
1000
0000
10010101
1000
0000
1001
1100
0110
1010
1111AssemblerMachine
LanguageProgram
(MIPS)MachineInterpretationHardware
Architecture
Description(e.g.
block
diagrams)ArchitectureImplementationLogic
Circuit
Description(Circuit
Schematic
Diagrams)What
is
“Computer
Architecture”
?2021/9/560Computer
Architecture
=Instruction
Set
Architecture
+Computer
OrganizationInstruction
Set
Architecture
(ISA)WHAT
the
computer
does
(logical
view)Computer
OrganizationHOW
the
ISA
is
implemented
(physical
view)We
will
study
both
in
this
courseInstruction
Set
Architecture
(ISA)2021/9/561Is
a
subset
of
Computer
ArchitectureDefinition
by
Amdahl,
Blaaw,
and
Brooks
–
1964“…
the
attributes
of
a
[computing]
system
as
seen
by
theprogrammer,i.e. the
conceptual
structure
andfunctionalbehavior,
as
distinct
from
the
organization
of
the
dataflowsandcontrols
the
logic
design,
andthe
physicalimplementation.”An
ISA
encompasses
…Instructions
and
Instruction
FormatsData
Types,
Encodings,
and
RepresentationsProgrammable
Storage:
Registers
and
MemoryAddressing
Modes:
Accessing
Instructions
and
DataHandling
Exceptional
ConditionsInstruction
Set
Architecture
–
cont’d2021/9/562Critical
interface
between
hardware
and
softwareStandardizes
instructions,
machine
languagebitpatterns,etc.Advantage:
different
implementations
of
thesamearchitectureDisadvantage:
sometimes
prevents
using
new
innovationsExamples
(versions) Introduced
inIntel(8086,
80386,
Pentium,
...)1978IBM
Power(Power
2,
3,
4,
5)1985HP
PA-RISC(v1.1,
v2.0)1986MIPS(MIPS
I,
II,
III,
IV,
V)1986Sun
Sparc(v8,
v9)1987Digital
Alpha(v1,
v3)1992PowerPC(601,
604,
…)1993Computer
Organization2021/9/563Realization
of
the
Instruction
Set
ArchitectureCharacteristics
of
principal
componentsRegisters,
ALUs,
FPUs,
Caches,
...Ways
in
which
these
components
areinterconnectedInformation
flow
between
componentsMeans
by
which
such
information
flow
iscontrolledRegister
Transfer
Level (RTL)
descriptionAbstractionsLower-level
details
are
hidden
to
higher levelsInstruction
set
architecture
----
the interface
between
hardware
and
lowest- level
softwareMany
implementations
of
varying
cost and
performance
can
run
identical software2021/9/564Contents
ofChapter12021/9/5651.11.21.31.41.51.6IntroductionBelow
Your
ProgramUnder
the
CoversPerformanceThe
Power
WallHistory
of
Computer
DevelopmentPerformance
is
the
key
to
understanding
underlying
motivationfor
the
hardware
and
its
organizationMeasure,
report,
and
summarize
performance
to
enable
users
tomake
intelligent
choicessee
through
the
marketing
hype!Why
is
some
hardware
better
than
others
for
differentprograms?What
factors
of
system
performance
are
hardwarerelated?(e.g.,
do
we
need
a
new
machine,
or
a
new
operating
system?)How
does
the
machine's
instruction
set
affectperformance?2021/9/5661.4PerformanceAirplanePassengersRange
(mi)Speed
(mph)Boeing737-100101630598Boeing7474704150610BAC/SudConcordouglasDC-8-5014687205442021/9/567How
much
faster
is
the
Concorde
compared
to
the
747?How
much
biggeris
the
Boeing
747
than
the
DouglasDC-8?So
which
of
these
airplanes
has
the
best
performance?!What
do
wemeasure?Define
performance….Response
Time(elapsed
time,
latency):how
long
does
it
take
for
my
job
to
run?how
long
does
it
take
to
execute
(start
to finish)
my
job?how
long
must
I
wait
for
the
database
query?Throughput:how
many
jobs
can
the
machine
run
at
once?what
is
the
average
execution
rate?how
much
work
is
getting
done?If
we
upgrade
a
machine
with
a
new
processor
what
do
we
increase?If
we
add
anewmachine
to
the
lab
what
do
weincrease?Computer
Performance:TIME,
TIME,
TIME!!!Individual
userconcerns…Systems
managerconcerns…2021/9/568Response
Time
and
Throughput2021/9/569Response
timeHow
long
it
takes
to
do
a
taskImportant
to
individual
usersThroughputTotal
work
done
per
unit
timee.g.,
tasks/transactions/…
per
hourImportant
to
datacenter
managersHow
are
response
time
&
throughput affected
byReplacing
the
processor
with
a
faster
version?Adding
more
processors?We’ll
focus
on
response
time
for
now…Relative
PerformanceDefine
Performance
=
1/Execution
Time“X
is
n
time
faster
than
Y”Example:
time
taken
to
run
a
program10s
on
A,
15s
on
BExecution
TimeB
/
ExecutionTimeA=
15s
/
10s
=
1.5So
A
is
1.5
times
faster
than
B2021/9/570Elapsed
Timecounts
everything
(disk
and
memory
accesses,
waiting
forI/O,
running
other
programs,
etc.)
from
start
to
finisha
useful
number,
but
often
not
good
for
comparison
purposeselapsed
time
=CPU
time
+
wait
time
(I/O,other
programs,
etc.)CPU
timedoesn't
count
waitingfor
I/O
or
time
spent
running
otherprogramscan
be
divided
into
user
CPU
time
and
system
CPU
time
(OScalls)CPU
time
=
user
CPUtime
+
system
CPU
time
elapsed
time
=
user
CPU
time
+system
CPU
time
+
wait
timeOur
focus:
user
CPU
time
(CPUexecution
time
or,simply,execution
time)time
spent
executing
the
lines
of
code
that
are
in
ourprogram2021/9/571CPUClocking:ReviewOperation
of
digital
hardware
governed
by a
constant-rate
clockClockperiodClock(cycles)Data
transferand
computationUpdate
stateClock
period:
duration
of
a
clock
cyclee.g.,
250ps
=
0.25ns
=
250×10–12sClock
frequency
(rate):
cycles
per
seconde.g.,
4.0GHz
=
4000MHz
=
4.0×109Hz2021/9/572CPUClocking:ReviewClock
rate
(clock
cycles
per
second
in
MHz
or
GHz)
is
inverse
ofclock
cycle
time
(clock
period)2021/9/573CC=1
/
CR10
nsec
clock
cycle=>100MHz
clock
rate5
nsec
clock
cycle=>200
MHz
clock
rate2
nsec
clock
cycle=>500
MHz
clock
rate1
nsec
(10-9)
clock
cycle=>1
GHz
(109)
clock
rate500
psec
clock
cycle=>2GHz
clock
rate250
psec
clock
cycle=>4GHz
clock
rate200
psec
clock
cycle=>5GHz
clock
ratePerformanceEquationIprogramseconds
cyclesprogram=
·secondcycleClock
cycle
timeCPU
executiontimeforaprogramCPU
clockcyclesfor
aprogram=·2021/9/574CPU
Time2021/9/575So,
to
improve
performance
one
can
either:reduce
the
number
of
cycles
for
a
program,
orreduce
the
clock
cycle
time,
or,
equivalently,increase
the
clock
rateImportant
point:
changing
the
cycle
time
oftenchanges
the
number
of
cycles
required
for
variousinstructions
because
it
means
changing
thehardware
design.Hardware
designer
must
often
trade
off
clock
rateagainst
cycle
countMany
techniques
that
decrease
the
number
ofclock
cycles
also
increase
the
clock
cycle
timeCPU
TimeExampleA
program
runs
on
computer
A
with
a
2
GHz
clock
in
10seconds. What
clock
rate
must
computer
B
run
at
to
run
thisprogram
in
6
seconds?
Unfortunately,
to
accomplish
this,computer
B
will
require
1.2
times
as
many
clock
cycles
ascomputer
A
to
run
the
program.6s
6s2021/9/576CPU
Time
6sClock
RateBB=
=
4GHz1.2
·20
·109
24
·109Clock
RateB
=Clock
CyclesA
=
CPU
Time
A
·Clock
RateA=10s
·2GHz
=
20
·109=
Clock
CyclesB
=
1.2
·Clock
CyclesAInstructionCountandCPIAverage
cycles
per
instructionDetermined
by
CPU
hardwareIf
different
instructions
have
different
CPIAverage
CPI
affected
by
instruction
mixClock
Cycles
=
Instruction
Count
·Cycles
per
InstructionCPU
Time
=
Instruction
Count
·CPI·Clock
Cycle
TimeInstruction
Count
·CPI=Clock
Rate2021/9/577CPU
performance
is
dependent
upon
three
characteristics:clock
cycle
(orrate)clock
cycles
perinstructioninstruction
count.It
is
difficult
to
change
one
parameter
in
complete
isolation
from others
because
the
basic
technologies
involved
in
changing
each characteristic
are
interdependent:Clock
cycle
time
—Hardware
technology
and
organizationCPI—Organization
and
instruction
set
architectureInstruction
count
—Instruction
set
architecture
and
compiler
technology2021/9/578CPI
ExampleACPU
TimeCPU
TimeBI
·
600ps= =
1.2I
·
500psCPU
TimeB
=
Instruction
Count
·
CPIB
·
Cycle
TimeB=
I
·
1.2
·
500ps
=
I
·
600ps=
I
·
2.0
·
250ps
=
I
·
500psComputer
A:
Cycle
Time
=
250ps,
CPI
=
2.0Computer
B:
Cycle
Time
=
500ps,
CPI
=
1.2Same
ISAWhich
is
faster,
and
by
how
much?CPU
Time
A
=
Instruction
Count
·
CPIA
·
Cycle
Time
AA
isfaster……by
this
much2021/9/579CPI
in
MoreDetailIf
different
instruction
classes
take
differentnumbers
of
cyclesnClock
Cycles
=
(CPIi
·
Instruction
Counti
)i=1Weighted
average
CPI
ni=1
iInstruction
CountInstruction
Counti
CPI·=
Instruction
Count
=CPI Clock
Cycles
Relative
frequency2021/9/580CPI
Example2021/9/581Alternative
compiled
code
sequences
usinginstructions
in
classes
A,
B,
C.
What
is
avg.
CPI?ClassABCCPIfor
class123IC
in
sequence
1212IC
in
sequence
2411Sequence
1:
IC
=
5Clock
Cycles=
2×1
+
1×2
+
2×3=10Avg.
CPI
=
10/5
=
2.0Sequence
2:
IC
=
6Clock
Cycles=
4×1
+
1×2
+
1×3=
9Avg.
CPI
=
9/6
=
1.5Performance
SummaryThe
BIG
PictureSecondsClock
cycleCPU
Time
=
Instructions
·
Clock
cycles
·Program
InstructionInstruction_countCPIclock_cycleAlgorithmXXProgramminglanguageXXCompilerXXISAXXXCoreorganizationXXTechnologyX2021/9/582OpFreqCPIiFreq
x
CPIiALU50%1.5Load20%51.0Store10%3.3Branch20%2.4S
=
2.22021/9/583ASimpleExample.5.5.25.41.01.0.3.3.3.4.2.41.62.01.95How
much
faster
would
the
machine
beif
abetter
datacachereduced
the
average
load
time
to
2cycles?CPUtime
new
=
1.6
x
IC
x
CC
so
2.2/1.6
means
37.5%
fasterHow
does
this
compare
with
using
branch
prediction
toshavea
cycle
off
the
branch
time?CPUtime
new
=
2.0
x
IC
x
CC
so
2.2/2.0 means
10%
fasterWhat
if
two
ALU
instructions
could
be
executed
at
once?CPUtime
new
=
1.95
x
IC
x
CC
so
2.2/1.95 means
12.8%
fasterWorkloads
and
Benchmarks2021/9/584Benchmarks
–
a
set
of
programs
that
form
a
“workload” specifically
chosen
to
measure
performanceSPEC
(System
Performance
Evaluation
Cooperative)
creates standard
sets
of
benchmarks
starting
with
SPEC89. The
latest
is SPEC
CPU2006
which
consists
of
12
integer
benchmark
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 《医院合作医疗》课件
- 《中华茶艺与文化》课件
- 课件:人格尊严的法律保护与实践教学指导
- 快乐的有效沟通技巧
- 薪资福利政策
- 2025年昌吉货运从业资格考题
- 营口理工学院《生物材料表界面工程》2023-2024学年第二学期期末试卷
- 辽宁地质工程职业学院《邮政快递软件设计》2023-2024学年第一学期期末试卷
- 玉溪农业职业技术学院《生化与分子生物学实验》2023-2024学年第二学期期末试卷
- 山西电力职业技术学院《实验诊断F》2023-2024学年第二学期期末试卷
- 工程师评审代办合同协议
- 小班健康活动:我会吃鱼
- 专利代理师考试题库含答案2024
- DB12 T1230-2023 政务信息资源共享 总体框架
- 管道拆除专项施工方案
- 广西壮族自治区马山县实验高中-双休背后:从“要我学”到“我要学”的转变-高三家长会【课件】
- GB/Z 27021.13-2025合格评定管理体系审核认证机构要求第13部分:合规管理体系审核与认证能力要求
- 湖南省长沙市四大名校2024-2025学年高三2月月考语文试题(原卷版+解析版)
- 《政府采购管理研究的国内外文献综述》5500字
- 糖尿病护理查房提出问题
- 2024年国网浙江省电力有限公司招聘考试真题
评论
0/150
提交评论