计算机体系结构课件_第1页
计算机体系结构课件_第2页
计算机体系结构课件_第3页
计算机体系结构课件_第4页
计算机体系结构课件_第5页
已阅读5页,还剩56页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Branch

InstructionsSima,

Fountain

and

KacsukChapter

8CSE33041Major

Chapter

GoalsTo

understand

how

to

minimise

theperformance

degradation

of

branchesBasic

approach

to

branchhandlingDelayed

branchingBranch

processingMulti-way

branchingGuarded

Execution2

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Types

of

branch

instructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranch

toSubroutineReturn

fromSubroutineLoop

closingconditionalbranchOtherconditionalbranchAlpha: BR

taPowPC: btaBSR

tabl

taRET

tabclrBTL

R1,

tabdnz

taBNE

R1,

tabne

ta3

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Types

of

branch

instructionsUnconditionalBranchesConditionalBrancheslBranch

toSubroutineomneOtherconditionalbranchAlpha:SimBRple

taUnconditionaPowPBC:ranbctahBSR

ta

ReturnRETfrt

bl

ta

Subroutibclra

LBTLoop

Rcl1,ostaingconditionalbdnbrzatanchBNE

R1,

tabne

taBR

tata:BSR

tata:RETta:

BNE

R1,taSUBL

R1,1,R1

ta:BLT

R1,ta

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley19974Why worry

about

different

typesof

branches?

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Branches

cause

stalls

ofpipelinesDifferent

techniques

for

minimizing

brancheffectsTake

advantage

of

different

type

of

branche.g.

Unconditional

branch

ALWAYS

occurs.Therefore,

hardware

can

plan

for

the

branch

inadvance.e.g.

difference

between

loop

closing

andnormalconditional5Ways

of

checking

condition6

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Conditional

branches

need

to

evaluate

apredicateTwo

mainapproachesResult

stateIBM

360,

370,PDP-11,VAX-11,

x86,

Pentium,MC68000,

Sparc,

PowerPC.Direct

CheckPDP-10,

Cyber/70,

PDP-8,

CRAY,

MIPS,HPPA,Dec

AlphaResult

State7

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Result

state

is

declared

to

holdstatusinformation

related

to

result

of

operationTypical

implementation

is

condition

codesor

flag

registers

which

are

updatedafterevery

arithmetic

result

is

producedConditional

branch

instructions

interrogatethe

flags

in

subsequent

instructionsResult

state

disadvantagesThe

generation

of

result

state

is

notstraightforward;irregular

structureoccupies

additional

chip

areaMakes

pipeline

longerALUTest>

0<

0=

01

Clock

Cycle8

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Result

state

disadvantages

...

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Sequential

in

concept.–

How

do

we

pack

instructions

in

Superscalar

andVLIW

machine?add

r1,

r2,

r3blt

ta1subr5,

r6,

r7bgt

ta2add

r1,

r2,

r3blt

ta1subr5,

r6,

r7bgt

ta29Direct

Check

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997No

result

state

is

declaredSpecified

conditions

are

directly

checked

byexplicit

instructionsConditional

branching

can

be

requested

ifthe

specified

conditions

aremet.May

involve

one

or

twoinstructionsFits

better

into

superscalar

architecturesShorter

clock

cycle

because

only

checkwhen

necessary10Comparison

of

conditionalbranchesTwo

instruction

Implementationadd r1,

r2,

r3cmpeq

r7,

r1,

0bt

r7,

labeldiv r5,

r4,

r1One

instruction

Implementationadd r1,

r2,

r3beq

r1,

labeldiv r5,

r4,

r111

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997The

effect

of

branchesNeed

to

understand

whether

branches

aretaken

or

notThen

tailor

the

architecture

to

performfastest

on

most

commoncase.It

turns

out

that

most

branches

are

taken

…..FetchDec/RdALUWriteFetchDec/RdALUWriteFetch12

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Branch statistics

...UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranch

toSubroutineSubroutineReturn

from Loop

closingconditionalbranchOtherconditionalbranch~

1/3~

1/3~

1/3Takenforfirst

n-1iterationsTakenNotTaken~

1/6~

1/6Taken~

5/6

David

Abramson,200013Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Branch

HandlingUtilizing

branchdelay

slotsHandling

ofunresolvedconditional

branchesAvoiding

cond.branchesDelayedBranchingbranchproc.Blocking

Speculativebranch

proc.Multiwaybranchproc.Branch

processingGuardedExecutionPerformance14

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Branch

Handling15

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Delayed

branchingUtilise

otherwise

wasted

cycles

followingbranches.

Achieved

by

inserting

aninstructionbehind

the

branch

and

executing

it

before

thebranchUtilized

on

Early

and

SubsequentRISCarchitecturesDelayed

branchingFetchDec/RdALUWriteFetchDeadDeadFetchDec/RdALUWriteaddbsubadd r1,

r2,

r3b

anywhereanywhere:

sub

………..div r5,

r4,

r1add r3,

r2,

r616

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Delayed

branching

...add r1,

r2,

r3bd

anywhereanywhere:

sub

………..div r5,

r4,

r1add r3,

r2,

r6Delayed

Branch

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997addbdivaddsubFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWrit1e7Delayed

branching

options18

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Can

reduce

to

single

delay

slot

iftargetaddress

available

at

end

of

decode

phaseCompiler

places

NOPs

in

slots

andmigratesinstructions

into

slots

using

code

migrationCompiler

must

perform

dataflowanalysisCodemigration

to fill

slotsbdanywhereanywhere:add r3,

r2,

r6div r5,

r4,

r1add r8,

r9,

r10NOPNOPadd r3,

r2,

r6bdanywheresub

………..

anywhere:sub

………..div r5,

r4,

r1add r8,

r9,

r1019

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Does

this

work

with

conditionalbranches/?Yes,

but

code

migrated

into

delay

slot

mustbe

performed

unconditionallybeqr6,anywhereadd r3,

r2,

r6div r5,

r4,

r1add r8,

r9,

r10NOPNOPadd r3,

r2,

r6div r5,

r4,

r1add r8,

r9,

r10beq

r6,anywhere20

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Optimisations

-

AnnulmentAnnul

delay

slot

if

branch

not

takenUseful

for

backward

conditional

branchesMakes

it

possible

to

move

a

loop

bodyinstruction

into

the

delay

slotAnnul

delay

slot

if

branch

is

takenUseful

for

forward

conditional

branchesMakes

it

possible

to

relocate

an

instruction

fromsequential

path

into

the

delay

slot21

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Annul

delay slot

if

branch

nottaken1234BrCDelayLoop

bodyConditionalBranch

with

noAnnulment22

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Loop

requires

5instructionsAnnul

delay slot

if

branch

nottaken

...1234BrCDelayLoop

bodyConditionalBranch

withAnnulment1Loop

requires

4instructions23

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Annul

delay slot

if

branch

istakenConditionalBranch

with

noAnnulment123BrCDelay4524

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Annul

delay slot

if

branch

istakenConditionalBranch

withAnnulment123BrC4525

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997FetchDec/RdALUWriteALUWriteFetchDFec/RetchdDAec/RLUdWriteFetchDec/RdALUWriteBranch

detectionIn

general,

the

earlier

a

branch

is

detected,the

better

the

handlingReduces

number

of

delay

slots

requiredEarly

branchdetectionInparallelLook

aheadIntegrated

withfetchFetchDec/RdALUWrite

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley199726Early

branch

detection

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997In

ParallelBranches

are

detected

in

parallel

with

decode

ofother

instructions

using

a

dedicatedbranchdecoderLook

aheadBranches

are

detected

from

the

instructionbuffer

but

ahead

of

general

instructiondecodingIntegratedBranches

are

detected

during

instruction

fetch27Conditional

branchesTake

branch?FetchDec/RdALUWriteThe

problem

with

conditional

branchesis

that

we

may

not

know

until

late

inthe

instruction

whether

we

wish

to

takeor

reject

the

controltransferbz

r1,taget

bzread

r1cmp

0WritePC28

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Handling

unresolved

conditionalbranchesBlocking

branch

processingSpeculative

branch

processingBranch

predictionExtent

of

speculationRecoveryfrom

mis-predictionMulti-way

branching29

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Blocking

branch

processingTrivial

approachExecution

of

conditional

is

simplystalled

until

condition

is

knownResult

state–

Can

possibly

order

instructions

toreduce

delay

in

waiting

forconditioncodes

to

be

setDirect

checking–

Need

to

wait

for

ALU

result

in

thisinstructionSetCCBrCFetchDec/RdALUWrite30

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Speculative

branch

processingPipeline

stalls

can

be

avoidedAfter

detection

of

unresolved

conditionalbranch

a

guess

is

made

of

theoutcomeIf

guess

iscorrectSpeculation

confirmed

and

continued

executionIf

guess

isincorrectInstructions

discardedFetch

restarted31

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Branch

prediction

schemesFixed

predictionTruepredictionA

“true”

guess

is

madeStatic

PredictionBased

on

object

codeDynamic

PredictionBased

on

execution

history32

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Fixed

Prediction(Guess

Not

Taken)

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Detect

unresolved

conditional

branch

and

guess

asnot

takenContinue

with

the

execution

of

the

sequential

path,but

start

the

execution

of

taken

path

(e.g.

calculateBTA)When

conditional

becomes

available,

check

theguessIf

correct,

continue

with

execution

delete

taken

pathpre-processingIf

incorrect,

delete

speculative

execution

andcontinue

with

taken

path33Fixed

Prediction(Guess

Not

Taken)ConditionKnownEvaluate

BTA34

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997CORRECTFixed

Prediction(Guess

Not

Taken)ConditionKnownEvaluate

BTA35

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997INCORRECTFixed

Prediction(Guess

Taken)Evaluate

BTAINCORRECTConditionKnown36

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Static

PredictionLike

fixed

prediction,

but

some

properties

ofthe

object

code

are

used

to

decidewhether

to

use

always

taken

or

nottakenHas

possibility

of

performing

bettersincesome

information

is

used.37

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Static

Prediction

...Typical

options

areOpcode

based

predictionFor

certain

opcodes

the

branch

isassumed

to

be

taken,

for

others

nottaken.Displacement

based

predictionIf

displacement

<

0

taken,

else

nottakenCompiler

directedPredict

bit

in

instruction

set

bycompiler

analysis

or

trace

datapOBTASign

bit38

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Dynamic

predictionPrediction

is

made

on

branchhistoryPhilosophy

is

that

history

is

goodguide

to

futurebehaviourGood

for

loops

because

tendto

iterate

multiple

timesGood

for

some

conditional

branchesdepending

on

behaviour

of

dataGood

for

exception

conditioncheckingWhat

HappenedLast

Time?

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley199739Dynamic

Prediction

Techniques

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Explicit

techniqueBranch

history

explicitly

stated

historybits1,

2or3bit

schemesNumber

of

bits

represent

likelihood

ofbranchbeing

taken

infutureImplicit

techniqueBranch

being

“recorded”

is

an

indication

thatbranch

was

takenRoughly

equivalent

to

1

bitscheme40Branch

History

BitsFor

each

branch,

record

a

statusfield.One

bitstatusDid

last

occurrence

of

branchoccur?Two

bitstatusState

table

decideswhich

stateThree

bitschemeHistory

of

last

3

branches41–

Majority

decision

onoutcome

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997State

transitions

in

2 bit

schemeFor

eachbranchAT

=

Actually

takenANT

=

Actually

not

takenStronglyTakenWeaklyTakenWeaklyNOTTakenATANTANTStronglyNOTTakenANTAT

ATATANTPrediction

Taken

David

Abramson,2000Prediction

Not

Taken42Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997State

transitions

in

2 bit

schemeFor

eachbranchAT

=

Actually

takenANT

=

Actually

not

takenStronglyTaken

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997ATANTAT

ATATANTStronglyNOTTakenANTWeaklyNOTTakenANTANT

AT AT

ANT

ATWeaklyTaken43Implicit

Dynamic

Scheme

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Two

mainschemesBranch

Target

Access

Cache

(BTAC)Branch

Target

Instruction

Cache

(BTIC)Both

schemes

introduce

an

extra

cacheEntries

are

only

stored

for

taken

branchesNot

taken

branches

are

notstoresIf

an

entry

is

in

cache,

then

next

branch

istakenBehaves

like

1

bit44BTAC

and BTIC

implementation–

The

instruction

at

thetarget(BTIC)10002000Load

R1,

….Store

Branch

AddressPLUS–

The

target

addressitself(BTAC)100045

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley199720001000Load

R1,

….BTAC

implementation

...Load

R1,

….1000200010002000Whoops!Add

this

branchaddress

to

thecacheDon’t

take

branchSpeculativelyNext

timeTake

branchSpeculatively46

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Performance

differences

inBTAC

and

BTIC47

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997BTAC

delivers

address

of

branch

in

time

fornext

fetchBTIC

delivers

actual

instruction.

Can

beslower

because

it

does

not

requirefetchLookupBTACFetchDec/RdExeWriteLookupBTICFetchDec/RdExeWriteImplementation

of

History

bits

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Placement

of

history

bitsI-cacheAlpha,

UltraSparcBranch

History

Table

(BHT)Power

PC

604,

620,

R10000Branch

Target

Address

Cache

(BTAC)MC68060,

Pentium,

R8000I-cache

and

BHT

effectively

the

same48I-Cache

and BHT

(Power

PC

604)I-cache16KFour

way

set

associativeInstruction

fetch

addressBHT128x4entriesPredictionLogic4instr./cycle2

History

BitsTaken/Not

takenBTA

for

a

taken

guess4x1Instr.

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997DecodeQueueIssueQueue49Prediction

Accuracy50

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997P

=

fc

*Pc

+

fm

*Pmwhere–

fc: Probability

of

correctly

predictingbranchesfm:Pc:Pm:Probability

of

mis-predicting

branchesPenalty

of

correctly

predicted

branchesPenalty

of

mis-predictedbranchesPrediction

Accuracy

….51

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Recovery from

MispredictionFetchDec/RdALUWriteIf

branch

prediction

hardware

makes

wrongguess,

need

to

revert

to

alternative

path

ofexecution.?FetchDec/RdFetchALUDec/RdFetchWriteALUDec/RdWriteALUWrite52

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Recovery from

Misprediction

...For

pipelined

machines,

may

be

possible

toabort

register

writesStore

instructions

are

harder

torecoverCondition

codes

might

also

need

to

berestored.FetchDec/RdALU

WriteFetchDec/RdALU

WriteFetchDec/RdALU

WriteRegister

FileFetchDec/RdALUWrite53

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Schemes

to

shorten

mis-prediction

recoveryBasic

prior

measures

for

recoveryIn

a

“taken”

guess,

save

sequential

addressIn

a

“not

taken”

guess,pre-calculate

and

savebranch

target

addressRequires

two

address

registers

per

speculatedconditional

branchSave

thisaddressSave

thisaddress54

David

Abramson,2000Material

from

Sima,

Fountain

and

Kacsuk,

Addison

Wesley1997Schemes

to

sh

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论