计算机抽象技术_第1页
计算机抽象技术_第2页
计算机抽象技术_第3页
计算机抽象技术_第4页
计算机抽象技术_第5页
已阅读5页,还剩112页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

ComputerOrganization

&Design—The

Hardware/SoftwareInterface2021/9/51ReferenceComputer

Organization

&

Design国内称为:计算机组成原理国外也称:computer

system,computerprinciple我们用4th

Edition可以参考本书英文版第三版第二、三、四版的中文版本计算机组成和设计硬件/软件接口第2版:出版社:清华大学出版社

第3、4版:出版社:机械工业出版社传统计算机组成原理教材作者为:白中英,王爱英,唐朔飞等2021/9/52Evaluation

and

Grades2021/9/53Class

Participation

10%Labs

20%Homework

Assignments

10%Project

20%Final

Examinations

40%章节学时内容备注第一章概论6章节:1.1~1.5计算机历史软硬件组成性能评价CPI、MIPS、FLOPSRISC、CISC第二章指令是硬件机器的语言——计算机指令系统8章节:2.1~2.14指令系统汇编反汇编算术、逻辑指令转移指令、子程序寻址方式C语言编译汇编指令转化成机器码(汇编),以及机器码转化成汇编(反汇编),考研补充:指令格式、种类第三章计算机中数的表示、转换与运算8章节:3.1~3.6数据表示整数加减运算整数乘除运算浮点表示,加减运算整数加减算法分析、优化加法器设计乘除算法分析浮点数第四章处理器——数据通路与控制器的设计14章节:4.1~4.4单个组件设计MIPS指令系统

ALU与ALU控制器单时钟数据通道多时钟数据通道控制器设计新版书只讲到单时钟,然后就流水线啦。流水还是留体系课。同时也为了配合实验,补充多时钟与控制器(FSM)内容。考研补充:总线型CPU设计,微指令控制第五章存贮体系结构8章节:5.1~5.5主要内容:存储器概论,位扩展字扩展Cache虚拟存储考研补充:存储器构成,位扩展、字扩展第六章接口处理器和外部设备8章节:6.1~6.6I/O概论磁盘系统总线系统、仲裁数据通讯:轮询、中断、DMA以概念为主0复21习/9/5224CourseObjectives2021/9/55Understand

modern

computers,

their

evolution,

andtrade-offs

at

the

HW/SW

interfaceInstruction

Set

ArchitectureComputer

ArithmeticPerformance

and

MetricsPipeliningUnderstand

the

design

of

a

modern

computer

systemDatapath

designControl

designMemory

System

DesignI/O

System

DesignWhat

youwillLearn2021/9/56How

are

programs

written

in

high

level

languages

(C or

Java)

translated

into

the

language

of

the

hardware, and

how

does

hardware

execute

the

resulting program?What

is

the

interface

between

software

and

hardware, and

how

does

software

instruct

hardware

to

perform needed

functions?What

determines

the

performance

of

a

program,

and how

can

software

programmers

and

hardware designers

improve

performance?The

basic

operation

of

a

computer:What

is

a

computer?What

doesit

do?How

does

it

work?2021/9/57

primitive

operations

(instructions)arithmeticinstruction

sequencing

and

processingmemoryinput/outputetc.Understand

the

relationship

between

abstractionsWhat

is

donein

hardware?

What

is

done

in

software?interface

designhigh-level

program

to

control

signals

(SW

->HW)Software

performance

depends

on

understanding

underlying

HWWhat

You

Will

LearnChapter

12021/9/58Computer

Abstractionsand

TechnologyContents

ofChapter12021/9/591.11.21.31.41.51.6IntroductionComputer

Language

and

Software

SystemComputer

Hardware

SystemPerformanceReal

Stuff:

Manufacturing

Pentium

ChipsHistory

of

Computer

DevelopmentENIACEckert

and

Mauchly1st

working

fully

electronic

computer.1946.18,000

Vacuum

tubes.1,800

instructions/sec.3,000

ft3.Electronic

Numerical

Integrator

And

Computer2021/9/510EDSACMaurice

Wilkes.1st

electronic

stored

programcomputer.650instructions/sec.1,400

ft3.EDSAC

1

(1949)2021/9/511Electronic

Delay

Storage

Automatic

CalculatorMainframeEra:1950s-

1960s12Processor(CPU)I/OEnabling

Tech:

ComputersBig

Players:

“Big

Iron”

(IBM,

UNIVAC)Cost:

$1M,

Target:

BusinessesUsing:

COBOL,

Fortran,

timesharing

OS2021/9/5I/OThe

mainframe

era

IBM

360

(1970's)2021/9/513Minicomputer

Era:

1970sEnabling

Tech:

Integrated

circuitsBig

Players:

Digital,

HPCost:

$10k,

Target:

Labs

&

universitiesUsing:

C,

UNIX

OS2021/9/514PC Era:

Mid

1980s -

Mid

2000sEnabling

Tech:

MicroprocessorsBig

Players:

Apple,

IBMCost:

$1k,

Target:

Consumers

(1/person)Using:

Basic,

Java,

Windows

OS2021/9/515Intel

4004Introduced

in

1970.First

microprocessor.2,250

transistors.12

mm2.108

kHz.2021/9/516Intel

808629,000

transistors.33

mm2.5

MHz.Introduced

in

1979.Basic

architecture

of

the

IA32

PC.2021/9/517Intel

804861,200,000

transistors.81

mm2.25

MHz.Introduced

in

1989.1st

pipelined

implementation

of

IA32.2021/9/518Pentium3,100,000

transistors.296

mm2.60

MHz.Introduced

in

1993.1st

superscalar

implementation

of

IA32.2021/9/519Pentium

455,000,000

transistors.146

mm2.3

GHz.Introduced

in

2000.2021/9/520Intel

Core

Duo291,000,000

transistors.143

mm2

(65nmtechnology).3

GHz.Introduced

in

2006.2021/9/521Core

1Core

2CacheUltraSparc

T2

(Niagara

2)500,000,000

transistors.342mm2–65nm.1.2–1.4

GHz.8

cores.64

threads.1

FPU

per

core.Introduced

in

2007.1

core2021/9/522Modern

computer

systems2021/9/523Post-PC

Era:

Late

2000s-

???Enabling

Tech:Wireless

networking,

smartphonesBig

Players:

Apple,

Nokia,

…Cost:

$500,

Target:

Consumers

on

the

goUsing:

Objective

C,

Android

OS2021/9/524Personal

MobileDevices

(PMD):Post-PC

Era:

Late

2000s-

???Enabling

Tech:

Local

Area

Networks,

broadband

InternetBig

Players:

Amazon,

Google,

…Target:

Transient

users

or

users

who

cannot

afford

high-endequipment2021/9/525CloudComputing:Post-PC

Era:

Late

2000s-

???Datacenters

andWarehouse

ScaleComputers

(WSC):Enabling

Tech:

Local

Area

Networks,

cheap

serversCost:

$200M

clusters

+

maintenance

costsTarget:

Internet

services

and

PMDs2021/9/526Advanced

RISC

Machine

(ARM)instruction

set

inside

the

iPhoneYou

will

learn

how

to

design

and

program

arelated

RISC

computer:MIPS2021/9/527iPhone

Innards

1

GHzARMCortex

A8I/OI/OProcessorI/O

MemoryYou

will

learn

about

multiple

processors,

data

level

parallelism,caches2021/9/528EECS

370:

Introduction

toComputer

OrganizationWhat

next?

Many-cores

and

GPUSIntel

Polaris:

80

cores

experimental

design.Intel

Larrabee:

16–40

cores

(first

generation

cancelled

recently).Nvidia:

programmableGPU

arrays

(hundreds)292021/9/529Classesof

Computers2021/9/530Desktop

/

Notebook

ComputersGeneral

purpose,

variety

of

softwareSubject

to

cost/performance

tradeoffServer

ComputersNetwork

basedHigh

capacity,

performance,

reliabilityRange

from

small

servers

to

building

sizedEmbedded

ComputersHidden

as

components

of

systemsStringent

power/performance/cost

constraintsTheProcessorMarketembedded

growth

>>

desktop

growth2021/9/531Where

else

are

embedded

processors

found?Whatnext? Divergent

embeddedapplications?Sensing,

communication,

multimedia,

control2021/9/532Contents

ofChapter12021/9/5331.11.21.31.41.51.6IntroductionBelow

Your

ProgramUnder

the

CoversPerformanceThe

Power

WallHistory

of

Computer

Development1.2 BelowYourProgramApplication

softwareWritten

in

high-level

languageSystem

softwareCompiler:

translates

HLL

code

tomachine

codeOperating

System:

service

codeHandling

input/outputManaging

memory

and

storageScheduling

tasks

&

sharing resourcesHardwareProcessor,

memory,

I/O

controllersLevels

ofProgramCodeHigh-level

language

program

(in

C)void

swap

(int

v[],

int

k){inttemp;temp

=v[k];v[k]

=v[k+1];v[k+1]

=

temp;}Assemblylanguageprogram(forMIPS)swap:

slladd$2,

$5,

2$2,

$4,

$2lw$15,

0($2)lw$16,

4($2)sw$16,

0($2)sw$15,

4($2)jr$310000000000000101000100001000000000000000100000100001000000100000...C

compilerone-to-manyone-to-oneassemblerMachine(object,

binary)

code

(forMIPS)Major

Components

of

aComputerProcessorControlDatapathMemoryInputOutputDevicesNetworkInputDeviceInputsObjectCodeProcessorControlDatapathMemory000000000000010100010000100000000000000010000010000100000010000010001100010011110000000000000000100011000101000000000000000001001010110001010000000000000000000010101100010011110000000000000100000000

11111

00000

0000000000001000InputOutputDevicesNetworkObjectCodeStoredinMemoryProcessorControlDatapathMemoryDevicesNetworkInputOutput00000000000

00101

000100001000000000000000100

00010

000100000010000010001100010

01111

000000000000000010001100010

10000

000000000000010010101100010

10000

000000000000000010101100010

01111

000000000000010000000011111

00000

0000000000001000Processor

Fetches

an

InstructionProcessor

fetches

an

instruction

from

memoryProcessorControlDatapathMemoryDevicesNetworkInputOutput00000000000

00101

000100001000000000000000100

00010

000100000010000010001100010

01111

000000000000000010001100010

10000

000000000000010010101100010

10000

000000000000000010101100010

01111

000000000000010000000011111

00000

0000000000001000Control

Decodes

the

InstructionControl

decodes

the

instruction

to

determine

what

toexecuteProcessorControl000000

00100

00010

0001000000100000DatapathMemoryDevicesNetworkInputOutputDatapath

Executes

the

InstructioDatapath

executes

the

instruction

as

directed

by

controlProcessorControl000000

00100

00010

0001000000100000Datapathcontents

Reg

#4

ADD

contents

Reg

#2results

putin

Reg

#2MemoryDevicesNetworkInputOutputWhat

Happens

Next?ProcessorControlDatapathMemory00000000000001010001000010000000000000001000001000010000001000001000110001001111000000000000000010001100010100000000000000000100101011000101000000000000000000001010110001001111000000000000010000000011111000000000000000001000DevicesNetworkInputOutputProcessorMemory000000

00000

00101

0001000010000000ControlDatapath000000

00100

00010

0001000000100000100011

00010

01111

0000000000000000100011

00010

10000

0000000000000100101011

00010

10000

0000000000000000101011

00010

01111

0000000000000100000000

11111

00000

0000000000001000FetchDecodeExecDevices

NetworkInputOutputWhat

Happens

Next?Processor

fetches

the

next

instruction

from

memoryHow

does

it

knowwhich

location

inmemory

to

fetch

from

next?Advantages

ofHigher-LevelLanguages

?2021/9/544Higher-level

languagesAllow

the

programmer

to

think

in

amore

natural

language

and

fortheir

intended

use

(Fortran

for

scientific

computation,

Cobol

forbusiness

programming,

Lisp

for

symbol

manipulation,

Java

for

webprogramming,

…)Improve

programmer

productivity

–more

understandable

code

thatis

easier

to

debug

andvalidateImprove

program

maintainabilityAllowprograms

to

beindependent

of

the

computer

on

which

theyare

developed

(compilers

and

assemblers

can

translate

high-levellanguage

programs

to

the

binary

instructions

of

any

machine)Emergence

of

optimizing

compilers

that

produce

very

efficientassembly

code

optimized

for

the

target

machineAs

a

result,

very

little

programming

is

done

today

at

the assembler

levelSystems

softwareaimed

at

programmersApplications

softwareaimedatusersLearn

hardware

can

program

the

Systems

softwareSystems

software

includesOperation

SystemCompilerAssembler…2021/9/545CategorizesoftwarebyitsuseAn

example

of

the

decomposability

ofcomputer

systemsApplicationssoftwarelaTEXVirtualmemoryI/O

devicedriversAssemblersasCompilersgccSystemssoftwareOperatingsystemsFilesystemSoftware2021/9/546Contents

ofChapter12021/9/5471.11.21.31.41.51.6IntroductionBelow

Your

ProgramUnder

the

CoversPerformanceThe

Power

WallHistory

of

Computer

Development2021/9/548TheSystemUnit2021/9/549What

are

common

components

insidethe

system

unit?

Processor

Memory

module

Expansion

cardsSound

cardModem

cardVideo

cardNetwork interface card

Ports

andConnectorsWhat

isthe

motherboard?2021/9/5502021/9/5512021/9/552InsidetheProcessorAMD

Barcelona:

4

processor

cores2021/9/553AMD’s

Barcelona

Multicore

ChipFour

out-of-order

cores

on

one

chip1.9

GHz

clock

rate65nm

technologyThree

levels

of

caches

(L1,

L2,

L3)

on

chipIntegrated

Northbridge2021/9/554The

five

classic

components

of

acomputer2021/9/555FiveClassicComponentsSince

the

1940’s,

computers

have

5

classic…componentsInput

devicesKeyboard,mouse,Output

devicesDisplay,

printer,

…Storage

devicesVolatilememory

devices:

DRAM,

SRAM,

…Permanent

storage

devices:

Magnetic,

Optical,

andFlash

disks,

…DatapathControlNewly

added

6th

component:

NetworkTogether,

they

are

called

the

ProcessorProcessorComputerControlDatapathMemoryDevicesInput2021/9/556OutputHardwareSystemssoftwareApplicationssoftwareA

simplified

view

of

hardware

and

software

ashierarchical

layers2021/9/557Machine

StructuresI/O

systemProcessorCompilerOperatingSystemApplication

(ex:

browser)Instruction

SetArchitectureDatapath

&

ControlDigital

DesignCircuit

DesignTransistorsMemoryHardwareSoftwareAssembler2021/9/558Levels

ofRepresentation/InterpretationHigher-Level

LanguageProgram

(e.g.

C)Assembly

LanguageProgram

(e.g.

MIPS)Compiler2021/9/559temp

=

v[k];v[k]

=

v[k+1];v[k+1]

=

temp;lw

$t0,

0($2)lw

$t1,

4($2)sw

$t1,

0($2)sw

$t0,

4($2)0000

1001

1100

0110

1010

1111

0101

10001010

1111

0101

1000

0000

1001

1100

01101100

0110

1010

1111

0101

1000

0000

10010101

1000

0000

1001

1100

0110

1010

1111AssemblerMachine

LanguageProgram

(MIPS)MachineInterpretationHardware

Architecture

Description(e.g.

block

diagrams)ArchitectureImplementationLogic

Circuit

Description(Circuit

Schematic

Diagrams)What

is

“Computer

Architecture”

?2021/9/560Computer

Architecture

=Instruction

Set

Architecture

+Computer

OrganizationInstruction

Set

Architecture

(ISA)WHAT

the

computer

does

(logical

view)Computer

OrganizationHOW

the

ISA

is

implemented

(physical

view)We

will

study

both

in

this

courseInstruction

Set

Architecture

(ISA)2021/9/561Is

a

subset

of

Computer

ArchitectureDefinition

by

Amdahl,

Blaaw,

and

Brooks

1964“…

the

attributes

of

a

[computing]

system

as

seen

by

theprogrammer,i.e. the

conceptual

structure

andfunctionalbehavior,

as

distinct

from

the

organization

of

the

dataflowsandcontrols

the

logic

design,

andthe

physicalimplementation.”An

ISA

encompasses

…Instructions

and

Instruction

FormatsData

Types,

Encodings,

and

RepresentationsProgrammable

Storage:

Registers

and

MemoryAddressing

Modes:

Accessing

Instructions

and

DataHandling

Exceptional

ConditionsInstruction

Set

Architecture

cont’d2021/9/562Critical

interface

between

hardware

and

softwareStandardizes

instructions,

machine

languagebitpatterns,etc.Advantage:

different

implementations

of

thesamearchitectureDisadvantage:

sometimes

prevents

using

new

innovationsExamples

(versions) Introduced

inIntel(8086,

80386,

Pentium,

...)1978IBM

Power(Power

2,

3,

4,

5)1985HP

PA-RISC(v1.1,

v2.0)1986MIPS(MIPS

I,

II,

III,

IV,

V)1986Sun

Sparc(v8,

v9)1987Digital

Alpha(v1,

v3)1992PowerPC(601,

604,

…)1993Computer

Organization2021/9/563Realization

of

the

Instruction

Set

ArchitectureCharacteristics

of

principal

componentsRegisters,

ALUs,

FPUs,

Caches,

...Ways

in

which

these

components

areinterconnectedInformation

flow

between

componentsMeans

by

which

such

information

flow

iscontrolledRegister

Transfer

Level (RTL)

descriptionAbstractionsLower-level

details

are

hidden

to

higher levelsInstruction

set

architecture

----

the interface

between

hardware

and

lowest- level

softwareMany

implementations

of

varying

cost and

performance

can

run

identical software2021/9/564Contents

ofChapter12021/9/5651.11.21.31.41.51.6IntroductionBelow

Your

ProgramUnder

the

CoversPerformanceThe

Power

WallHistory

of

Computer

DevelopmentPerformance

is

the

key

to

understanding

underlying

motivationfor

the

hardware

and

its

organizationMeasure,

report,

and

summarize

performance

to

enable

users

tomake

intelligent

choicessee

through

the

marketing

hype!Why

is

some

hardware

better

than

others

for

differentprograms?What

factors

of

system

performance

are

hardwarerelated?(e.g.,

do

we

need

a

new

machine,

or

a

new

operating

system?)How

does

the

machine's

instruction

set

affectperformance?2021/9/5661.4PerformanceAirplanePassengersRange

(mi)Speed

(mph)Boeing737-100101630598Boeing7474704150610BAC/SudConcordouglasDC-8-5014687205442021/9/567How

much

faster

is

the

Concorde

compared

to

the

747?How

much

biggeris

the

Boeing

747

than

the

DouglasDC-8?So

which

of

these

airplanes

has

the

best

performance?!What

do

wemeasure?Define

performance….Response

Time(elapsed

time,

latency):how

long

does

it

take

for

my

job

to

run?how

long

does

it

take

to

execute

(start

to finish)

my

job?how

long

must

I

wait

for

the

database

query?Throughput:how

many

jobs

can

the

machine

run

at

once?what

is

the

average

execution

rate?how

much

work

is

getting

done?If

we

upgrade

a

machine

with

a

new

processor

what

do

we

increase?If

we

add

anewmachine

to

the

lab

what

do

weincrease?Computer

Performance:TIME,

TIME,

TIME!!!Individual

userconcerns…Systems

managerconcerns…2021/9/568Response

Time

and

Throughput2021/9/569Response

timeHow

long

it

takes

to

do

a

taskImportant

to

individual

usersThroughputTotal

work

done

per

unit

timee.g.,

tasks/transactions/…

per

hourImportant

to

datacenter

managersHow

are

response

time

&

throughput affected

byReplacing

the

processor

with

a

faster

version?Adding

more

processors?We’ll

focus

on

response

time

for

now…Relative

PerformanceDefine

Performance

=

1/Execution

Time“X

is

n

time

faster

than

Y”Example:

time

taken

to

run

a

program10s

on

A,

15s

on

BExecution

TimeB

/

ExecutionTimeA=

15s

/

10s

=

1.5So

A

is

1.5

times

faster

than

B2021/9/570Elapsed

Timecounts

everything

(disk

and

memory

accesses,

waiting

forI/O,

running

other

programs,

etc.)

from

start

to

finisha

useful

number,

but

often

not

good

for

comparison

purposeselapsed

time

=CPU

time

+

wait

time

(I/O,other

programs,

etc.)CPU

timedoesn't

count

waitingfor

I/O

or

time

spent

running

otherprogramscan

be

divided

into

user

CPU

time

and

system

CPU

time

(OScalls)CPU

time

=

user

CPUtime

+

system

CPU

time

elapsed

time

=

user

CPU

time

+system

CPU

time

+

wait

timeOur

focus:

user

CPU

time

(CPUexecution

time

or,simply,execution

time)time

spent

executing

the

lines

of

code

that

are

in

ourprogram2021/9/571CPUClocking:ReviewOperation

of

digital

hardware

governed

by a

constant-rate

clockClockperiodClock(cycles)Data

transferand

computationUpdate

stateClock

period:

duration

of

a

clock

cyclee.g.,

250ps

=

0.25ns

=

250×10–12sClock

frequency

(rate):

cycles

per

seconde.g.,

4.0GHz

=

4000MHz

=

4.0×109Hz2021/9/572CPUClocking:ReviewClock

rate

(clock

cycles

per

second

in

MHz

or

GHz)

is

inverse

ofclock

cycle

time

(clock

period)2021/9/573CC=1

/

CR10

nsec

clock

cycle=>100MHz

clock

rate5

nsec

clock

cycle=>200

MHz

clock

rate2

nsec

clock

cycle=>500

MHz

clock

rate1

nsec

(10-9)

clock

cycle=>1

GHz

(109)

clock

rate500

psec

clock

cycle=>2GHz

clock

rate250

psec

clock

cycle=>4GHz

clock

rate200

psec

clock

cycle=>5GHz

clock

ratePerformanceEquationIprogramseconds

cyclesprogram=

·secondcycleClock

cycle

timeCPU

executiontimeforaprogramCPU

clockcyclesfor

aprogram=·2021/9/574CPU

Time2021/9/575So,

to

improve

performance

one

can

either:reduce

the

number

of

cycles

for

a

program,

orreduce

the

clock

cycle

time,

or,

equivalently,increase

the

clock

rateImportant

point:

changing

the

cycle

time

oftenchanges

the

number

of

cycles

required

for

variousinstructions

because

it

means

changing

thehardware

design.Hardware

designer

must

often

trade

off

clock

rateagainst

cycle

countMany

techniques

that

decrease

the

number

ofclock

cycles

also

increase

the

clock

cycle

timeCPU

TimeExampleA

program

runs

on

computer

A

with

a

2

GHz

clock

in

10seconds. What

clock

rate

must

computer

B

run

at

to

run

thisprogram

in

6

seconds?

Unfortunately,

to

accomplish

this,computer

B

will

require

1.2

times

as

many

clock

cycles

ascomputer

A

to

run

the

program.6s

6s2021/9/576CPU

Time

6sClock

RateBB=

=

4GHz1.2

·20

·109

24

·109Clock

RateB

=Clock

CyclesA

=

CPU

Time

A

·Clock

RateA=10s

·2GHz

=

20

·109=

Clock

CyclesB

=

1.2

·Clock

CyclesAInstructionCountandCPIAverage

cycles

per

instructionDetermined

by

CPU

hardwareIf

different

instructions

have

different

CPIAverage

CPI

affected

by

instruction

mixClock

Cycles

=

Instruction

Count

·Cycles

per

InstructionCPU

Time

=

Instruction

Count

·CPI·Clock

Cycle

TimeInstruction

Count

·CPI=Clock

Rate2021/9/577CPU

performance

is

dependent

upon

three

characteristics:clock

cycle

(orrate)clock

cycles

perinstructioninstruction

count.It

is

difficult

to

change

one

parameter

in

complete

isolation

from others

because

the

basic

technologies

involved

in

changing

each characteristic

are

interdependent:Clock

cycle

time

—Hardware

technology

and

organizationCPI—Organization

and

instruction

set

architectureInstruction

count

—Instruction

set

architecture

and

compiler

technology2021/9/578CPI

ExampleACPU

TimeCPU

TimeBI

·

600ps= =

1.2I

·

500psCPU

TimeB

=

Instruction

Count

·

CPIB

·

Cycle

TimeB=

I

·

1.2

·

500ps

=

I

·

600ps=

I

·

2.0

·

250ps

=

I

·

500psComputer

A:

Cycle

Time

=

250ps,

CPI

=

2.0Computer

B:

Cycle

Time

=

500ps,

CPI

=

1.2Same

ISAWhich

is

faster,

and

by

how

much?CPU

Time

A

=

Instruction

Count

·

CPIA

·

Cycle

Time

AA

isfaster……by

this

much2021/9/579CPI

in

MoreDetailIf

different

instruction

classes

take

differentnumbers

of

cyclesnClock

Cycles

=

(CPIi

·

Instruction

Counti

)i=1Weighted

average

CPI

ni=1

iInstruction

CountInstruction

Counti

CPI·=

Instruction

Count

=CPI Clock

Cycles

Relative

frequency2021/9/580CPI

Example2021/9/581Alternative

compiled

code

sequences

usinginstructions

in

classes

A,

B,

C.

What

is

avg.

CPI?ClassABCCPIfor

class123IC

in

sequence

1212IC

in

sequence

2411Sequence

1:

IC

=

5Clock

Cycles=

2×1

+

1×2

+

2×3=10Avg.

CPI

=

10/5

=

2.0Sequence

2:

IC

=

6Clock

Cycles=

4×1

+

1×2

+

1×3=

9Avg.

CPI

=

9/6

=

1.5Performance

SummaryThe

BIG

PictureSecondsClock

cycleCPU

Time

=

Instructions

·

Clock

cycles

·Program

InstructionInstruction_countCPIclock_cycleAlgorithmXXProgramminglanguageXXCompilerXXISAXXXCoreorganizationXXTechnologyX2021/9/582OpFreqCPIiFreq

x

CPIiALU50%1.5Load20%51.0Store10%3.3Branch20%2.4S

=

2.22021/9/583ASimpleExample.5.5.25.41.01.0.3.3.3.4.2.41.62.01.95How

much

faster

would

the

machine

beif

abetter

datacachereduced

the

average

load

time

to

2cycles?CPUtime

new

=

1.6

x

IC

x

CC

so

2.2/1.6

means

37.5%

fasterHow

does

this

compare

with

using

branch

prediction

toshavea

cycle

off

the

branch

time?CPUtime

new

=

2.0

x

IC

x

CC

so

2.2/2.0 means

10%

fasterWhat

if

two

ALU

instructions

could

be

executed

at

once?CPUtime

new

=

1.95

x

IC

x

CC

so

2.2/1.95 means

12.8%

fasterWorkloads

and

Benchmarks2021/9/584Benchmarks

a

set

of

programs

that

form

a

“workload” specifically

chosen

to

measure

performanceSPEC

(System

Performance

Evaluation

Cooperative)

creates standard

sets

of

benchmarks

starting

with

SPEC89. The

latest

is SPEC

CPU2006

which

consists

of

12

integer

benchmark

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论