版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Branch
InstructionsSima,
Fountain
and
KacsukChapter
8CSE33041Major
Chapter
GoalsTo
understand
how
to
minimise
theperformance
degradation
of
branchesBasic
approach
to
branchhandlingDelayed
branchingBranch
processingMulti-way
branchingGuarded
Execution2
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Types
of
branch
instructionsUnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranch
toSubroutineReturn
fromSubroutineLoop
closingconditionalbranchOtherconditionalbranchAlpha: BR
taPowPC: btaBSR
tabl
taRET
tabclrBTL
R1,
tabdnz
taBNE
R1,
tabne
ta3
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Types
of
branch
instructionsUnconditionalBranchesConditionalBrancheslBranch
toSubroutineomneOtherconditionalbranchAlpha:SimBRple
taUnconditionaPowPBC:ranbctahBSR
ta
ReturnRETfrt
bl
ta
Subroutibclra
LBTLoop
Rcl1,ostaingconditionalbdnbrzatanchBNE
R1,
tabne
taBR
tata:BSR
tata:RETta:
BNE
R1,taSUBL
R1,1,R1
ta:BLT
R1,ta
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley19974Why worry
about
different
typesof
branches?
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Branches
cause
stalls
ofpipelinesDifferent
techniques
for
minimizing
brancheffectsTake
advantage
of
different
type
of
branche.g.
Unconditional
branch
ALWAYS
occurs.Therefore,
hardware
can
plan
for
the
branch
inadvance.e.g.
difference
between
loop
closing
andnormalconditional5Ways
of
checking
condition6
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Conditional
branches
need
to
evaluate
apredicateTwo
mainapproachesResult
stateIBM
360,
370,PDP-11,VAX-11,
x86,
Pentium,MC68000,
Sparc,
PowerPC.Direct
CheckPDP-10,
Cyber/70,
PDP-8,
CRAY,
MIPS,HPPA,Dec
AlphaResult
State7
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Result
state
is
declared
to
holdstatusinformation
related
to
result
of
operationTypical
implementation
is
condition
codesor
flag
registers
which
are
updatedafterevery
arithmetic
result
is
producedConditional
branch
instructions
interrogatethe
flags
in
subsequent
instructionsResult
state
disadvantagesThe
generation
of
result
state
is
notstraightforward;irregular
structureoccupies
additional
chip
areaMakes
pipeline
longerALUTest>
0<
0=
01
Clock
Cycle8
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Result
state
disadvantages
...
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Sequential
in
concept.–
How
do
we
pack
instructions
in
Superscalar
andVLIW
machine?add
r1,
r2,
r3blt
ta1subr5,
r6,
r7bgt
ta2add
r1,
r2,
r3blt
ta1subr5,
r6,
r7bgt
ta29Direct
Check
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997No
result
state
is
declaredSpecified
conditions
are
directly
checked
byexplicit
instructionsConditional
branching
can
be
requested
ifthe
specified
conditions
aremet.May
involve
one
or
twoinstructionsFits
better
into
superscalar
architecturesShorter
clock
cycle
because
only
checkwhen
necessary10Comparison
of
conditionalbranchesTwo
instruction
Implementationadd r1,
r2,
r3cmpeq
r7,
r1,
0bt
r7,
labeldiv r5,
r4,
r1One
instruction
Implementationadd r1,
r2,
r3beq
r1,
labeldiv r5,
r4,
r111
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997The
effect
of
branchesNeed
to
understand
whether
branches
aretaken
or
notThen
tailor
the
architecture
to
performfastest
on
most
commoncase.It
turns
out
that
most
branches
are
taken
…..FetchDec/RdALUWriteFetchDec/RdALUWriteFetch12
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Branch statistics
...UnconditionalBranchesConditionalBranchesSimpleUnconditionalBranchBranch
toSubroutineSubroutineReturn
from Loop
closingconditionalbranchOtherconditionalbranch~
1/3~
1/3~
1/3Takenforfirst
n-1iterationsTakenNotTaken~
1/6~
1/6Taken~
5/6
David
Abramson,200013Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Branch
HandlingUtilizing
branchdelay
slotsHandling
ofunresolvedconditional
branchesAvoiding
cond.branchesDelayedBranchingbranchproc.Blocking
Speculativebranch
proc.Multiwaybranchproc.Branch
processingGuardedExecutionPerformance14
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Branch
Handling15
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Delayed
branchingUtilise
otherwise
wasted
cycles
followingbranches.
Achieved
by
inserting
aninstructionbehind
the
branch
and
executing
it
before
thebranchUtilized
on
Early
and
SubsequentRISCarchitecturesDelayed
branchingFetchDec/RdALUWriteFetchDeadDeadFetchDec/RdALUWriteaddbsubadd r1,
r2,
r3b
anywhereanywhere:
sub
………..div r5,
r4,
r1add r3,
r2,
r616
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Delayed
branching
...add r1,
r2,
r3bd
anywhereanywhere:
sub
………..div r5,
r4,
r1add r3,
r2,
r6Delayed
Branch
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997addbdivaddsubFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWriteFetchDec/RdALUWrit1e7Delayed
branching
options18
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Can
reduce
to
single
delay
slot
iftargetaddress
available
at
end
of
decode
phaseCompiler
places
NOPs
in
slots
andmigratesinstructions
into
slots
using
code
migrationCompiler
must
perform
dataflowanalysisCodemigration
to fill
slotsbdanywhereanywhere:add r3,
r2,
r6div r5,
r4,
r1add r8,
r9,
r10NOPNOPadd r3,
r2,
r6bdanywheresub
………..
anywhere:sub
………..div r5,
r4,
r1add r8,
r9,
r1019
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Does
this
work
with
conditionalbranches/?Yes,
but
code
migrated
into
delay
slot
mustbe
performed
unconditionallybeqr6,anywhereadd r3,
r2,
r6div r5,
r4,
r1add r8,
r9,
r10NOPNOPadd r3,
r2,
r6div r5,
r4,
r1add r8,
r9,
r10beq
r6,anywhere20
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Optimisations
-
AnnulmentAnnul
delay
slot
if
branch
not
takenUseful
for
backward
conditional
branchesMakes
it
possible
to
move
a
loop
bodyinstruction
into
the
delay
slotAnnul
delay
slot
if
branch
is
takenUseful
for
forward
conditional
branchesMakes
it
possible
to
relocate
an
instruction
fromsequential
path
into
the
delay
slot21
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Annul
delay slot
if
branch
nottaken1234BrCDelayLoop
bodyConditionalBranch
with
noAnnulment22
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Loop
requires
5instructionsAnnul
delay slot
if
branch
nottaken
...1234BrCDelayLoop
bodyConditionalBranch
withAnnulment1Loop
requires
4instructions23
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Annul
delay slot
if
branch
istakenConditionalBranch
with
noAnnulment123BrCDelay4524
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Annul
delay slot
if
branch
istakenConditionalBranch
withAnnulment123BrC4525
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997FetchDec/RdALUWriteALUWriteFetchDFec/RetchdDAec/RLUdWriteFetchDec/RdALUWriteBranch
detectionIn
general,
the
earlier
a
branch
is
detected,the
better
the
handlingReduces
number
of
delay
slots
requiredEarly
branchdetectionInparallelLook
aheadIntegrated
withfetchFetchDec/RdALUWrite
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley199726Early
branch
detection
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997In
ParallelBranches
are
detected
in
parallel
with
decode
ofother
instructions
using
a
dedicatedbranchdecoderLook
aheadBranches
are
detected
from
the
instructionbuffer
but
ahead
of
general
instructiondecodingIntegratedBranches
are
detected
during
instruction
fetch27Conditional
branchesTake
branch?FetchDec/RdALUWriteThe
problem
with
conditional
branchesis
that
we
may
not
know
until
late
inthe
instruction
whether
we
wish
to
takeor
reject
the
controltransferbz
r1,taget
bzread
r1cmp
0WritePC28
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Handling
unresolved
conditionalbranchesBlocking
branch
processingSpeculative
branch
processingBranch
predictionExtent
of
speculationRecoveryfrom
mis-predictionMulti-way
branching29
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Blocking
branch
processingTrivial
approachExecution
of
conditional
is
simplystalled
until
condition
is
knownResult
state–
Can
possibly
order
instructions
toreduce
delay
in
waiting
forconditioncodes
to
be
setDirect
checking–
Need
to
wait
for
ALU
result
in
thisinstructionSetCCBrCFetchDec/RdALUWrite30
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Speculative
branch
processingPipeline
stalls
can
be
avoidedAfter
detection
of
unresolved
conditionalbranch
a
guess
is
made
of
theoutcomeIf
guess
iscorrectSpeculation
confirmed
and
continued
executionIf
guess
isincorrectInstructions
discardedFetch
restarted31
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Branch
prediction
schemesFixed
predictionTruepredictionA
“true”
guess
is
madeStatic
PredictionBased
on
object
codeDynamic
PredictionBased
on
execution
history32
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Fixed
Prediction(Guess
Not
Taken)
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Detect
unresolved
conditional
branch
and
guess
asnot
takenContinue
with
the
execution
of
the
sequential
path,but
start
the
execution
of
taken
path
(e.g.
calculateBTA)When
conditional
becomes
available,
check
theguessIf
correct,
continue
with
execution
delete
taken
pathpre-processingIf
incorrect,
delete
speculative
execution
andcontinue
with
taken
path33Fixed
Prediction(Guess
Not
Taken)ConditionKnownEvaluate
BTA34
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997CORRECTFixed
Prediction(Guess
Not
Taken)ConditionKnownEvaluate
BTA35
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997INCORRECTFixed
Prediction(Guess
Taken)Evaluate
BTAINCORRECTConditionKnown36
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Static
PredictionLike
fixed
prediction,
but
some
properties
ofthe
object
code
are
used
to
decidewhether
to
use
always
taken
or
nottakenHas
possibility
of
performing
bettersincesome
information
is
used.37
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Static
Prediction
...Typical
options
areOpcode
based
predictionFor
certain
opcodes
the
branch
isassumed
to
be
taken,
for
others
nottaken.Displacement
based
predictionIf
displacement
<
0
taken,
else
nottakenCompiler
directedPredict
bit
in
instruction
set
bycompiler
analysis
or
trace
datapOBTASign
bit38
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Dynamic
predictionPrediction
is
made
on
branchhistoryPhilosophy
is
that
history
is
goodguide
to
futurebehaviourGood
for
loops
because
tendto
iterate
multiple
timesGood
for
some
conditional
branchesdepending
on
behaviour
of
dataGood
for
exception
conditioncheckingWhat
HappenedLast
Time?
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley199739Dynamic
Prediction
Techniques
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Explicit
techniqueBranch
history
explicitly
stated
historybits1,
2or3bit
schemesNumber
of
bits
represent
likelihood
ofbranchbeing
taken
infutureImplicit
techniqueBranch
being
“recorded”
is
an
indication
thatbranch
was
takenRoughly
equivalent
to
1
bitscheme40Branch
History
BitsFor
each
branch,
record
a
statusfield.One
bitstatusDid
last
occurrence
of
branchoccur?Two
bitstatusState
table
decideswhich
stateThree
bitschemeHistory
of
last
3
branches41–
Majority
decision
onoutcome
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997State
transitions
in
2 bit
schemeFor
eachbranchAT
=
Actually
takenANT
=
Actually
not
takenStronglyTakenWeaklyTakenWeaklyNOTTakenATANTANTStronglyNOTTakenANTAT
ATATANTPrediction
Taken
David
Abramson,2000Prediction
Not
Taken42Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997State
transitions
in
2 bit
schemeFor
eachbranchAT
=
Actually
takenANT
=
Actually
not
takenStronglyTaken
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997ATANTAT
ATATANTStronglyNOTTakenANTWeaklyNOTTakenANTANT
AT AT
ANT
ATWeaklyTaken43Implicit
Dynamic
Scheme
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Two
mainschemesBranch
Target
Access
Cache
(BTAC)Branch
Target
Instruction
Cache
(BTIC)Both
schemes
introduce
an
extra
cacheEntries
are
only
stored
for
taken
branchesNot
taken
branches
are
notstoresIf
an
entry
is
in
cache,
then
next
branch
istakenBehaves
like
1
bit44BTAC
and BTIC
implementation–
The
instruction
at
thetarget(BTIC)10002000Load
R1,
….Store
Branch
AddressPLUS–
The
target
addressitself(BTAC)100045
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley199720001000Load
R1,
….BTAC
implementation
...Load
R1,
….1000200010002000Whoops!Add
this
branchaddress
to
thecacheDon’t
take
branchSpeculativelyNext
timeTake
branchSpeculatively46
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Performance
differences
inBTAC
and
BTIC47
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997BTAC
delivers
address
of
branch
in
time
fornext
fetchBTIC
delivers
actual
instruction.
Can
beslower
because
it
does
not
requirefetchLookupBTACFetchDec/RdExeWriteLookupBTICFetchDec/RdExeWriteImplementation
of
History
bits
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Placement
of
history
bitsI-cacheAlpha,
UltraSparcBranch
History
Table
(BHT)Power
PC
604,
620,
R10000Branch
Target
Address
Cache
(BTAC)MC68060,
Pentium,
R8000I-cache
and
BHT
effectively
the
same48I-Cache
and BHT
(Power
PC
604)I-cache16KFour
way
set
associativeInstruction
fetch
addressBHT128x4entriesPredictionLogic4instr./cycle2
History
BitsTaken/Not
takenBTA
for
a
taken
guess4x1Instr.
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997DecodeQueueIssueQueue49Prediction
Accuracy50
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997P
=
fc
*Pc
+
fm
*Pmwhere–
fc: Probability
of
correctly
predictingbranchesfm:Pc:Pm:Probability
of
mis-predicting
branchesPenalty
of
correctly
predicted
branchesPenalty
of
mis-predictedbranchesPrediction
Accuracy
….51
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Recovery from
MispredictionFetchDec/RdALUWriteIf
branch
prediction
hardware
makes
wrongguess,
need
to
revert
to
alternative
path
ofexecution.?FetchDec/RdFetchALUDec/RdFetchWriteALUDec/RdWriteALUWrite52
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Recovery from
Misprediction
...For
pipelined
machines,
may
be
possible
toabort
register
writesStore
instructions
are
harder
torecoverCondition
codes
might
also
need
to
berestored.FetchDec/RdALU
WriteFetchDec/RdALU
WriteFetchDec/RdALU
WriteRegister
FileFetchDec/RdALUWrite53
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Schemes
to
shorten
mis-prediction
recoveryBasic
prior
measures
for
recoveryIn
a
“taken”
guess,
save
sequential
addressIn
a
“not
taken”
guess,pre-calculate
and
savebranch
target
addressRequires
two
address
registers
per
speculatedconditional
branchSave
thisaddressSave
thisaddress54
David
Abramson,2000Material
from
Sima,
Fountain
and
Kacsuk,
Addison
Wesley1997Schemes
to
sh
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 中国信托eTrust银行网上银行查询功能登入及操作说明
- 旅行社管理信息系统-20220704071006
- 租房合同书样板房东出租
- 员工住宿协议书书
- 嘉兴风机安装施工方案
- 咖啡教学课程设计
- 吉首大学MATLAB课程设计
- 吉他成人课程设计
- 可回收锚索施工方案
- 变电站防火工作方案
- 小班《我会排队》课件
- 学习心理与辅导
- 嵌入式系统设计学习通课后章节答案期末考试题库2023年
- 生理学教学课件:Chapter 2 Basic Functions of the Cell (细胞的基本功能)
- 神木县赵仓峁煤矿矿山地质环境保护与土地复垦方案
- 《留守儿童心理健康教育的研究》课题结题报告
- 监理大纲光伏电站工程
- 新概念青少版入门级B-Unit2-3试卷
- 泵维修作业安全技术操作规程
- 房屋修缮工程施工方案
- 【课件】常见结构的认识+课件-2022-2023学年高中通用技术苏教版(2019)必修《技术与设计1》
评论
0/150
提交评论