一种简化的Turbo码并行译码方法

上传人：1*** IP属地：上海上传时间：2024-10-23 格式：DOCX 页数：7 大小：231.27KB 积分：15 举报 版权申诉

已阅读5页，还剩2页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

PAGE

ASimplifiedParallelDecodingSchemeforTurboCodes

WangLe,HuangYu,YangHongwen

WirelessCommunicationCenter,BeijingUniversityofPostsandTelecommunications,Beijing,China(100876)

Abstract

ParalleldecodingofTurbocodesisvitaltotheapplicationswithveryhighdatarates.Intheoverlappingparalleldecodingscheme,thedecodingthroughputcannotincreaselinearlywiththenumberofsegments.Aschemewherethethroughputcanincreaselinearlywasproposed,wheretheboundaryinformationofeachiterationisstoredtobetheinitialconditioninnextiteration.Inthispaper,wehavesimplifiedtheschemebyonlystoringtheindexofthemostprobablestateoftheboundary.Withthecostofsomesmallperformanceloss,theschemecanreducelargelytheextramemoryrequiredtostoretheboundaryinformation.

Keywords: Turbocodes;MAPalgorithm;paralleldecoding

Introduction

Withtheever-increasingdemandofthedatarate,turbocodeisfacingstringentchallenges.ITUsuggeststhatthedatarateof4Gsystembeatleast100Mbps,inmobileconditions.Asthelocaloperatingclockfrequencyinhardwaresystemistypicallywithin100MHznowadays,itisreported[1]thatthethroughputofonly6.17Mbpscanbeachievedforturbocodeswithconventionalserialdecoding[2].Obviously,thedecodingalgorithmofturbocodehastobeimprovedsignificantlytosurviveinthefuturecommunicationsystems.

Thebottleneckofthethroughputliesinitsdecodingscheme.Theconvolutionalcodesistypicallyemployedastheconstituentcodes,leadingtothedecodingalgorithmemployssomerecursiveschemesbasedontrellis,suchassymbol-by-symbolMaximumAPosterioriprobability(MAP)algorithm[3].Itrequireslotsofrecursivecomputationsinbothcomponentdecoders.Moreover,thedecodingprocedureisiterative.Hence,thedecodingdelayisquitelarge.

Toreducethedecodingdelayandimprovethedecodingthroughput,theparalleldecodingmethodwasproposed[4]bydividingthewholeframeintomultiplesub-blocksanddecodingthesegmentswithabankofidenticalsub-blockMAPprocessors.Withtheparalleldecodingscheme,thethroughputcanbeimprovedbyafactorofaboutK,thenumberofsub-blockssegmented.Underthesameconditionsasmentionedpreviously,thethroughputof100MbpscanbeachievedwithK=16.

However,asaresultofthesegmentation,eachsub-blocksuffersthelostofboundaryinformation(initialconditions),whichisvitalfortheperformance.Togettheboundaryinformation,amethodwithsegmentsoverlappingisproposed[4],bycomputingsomemoreredundanttrellisstages.However,suchoverlappingsufferssomeextracomputationaldelay,andleadstolowerdecodingspeed.TheactualspeedupfactorissmallerthanK,especiallywhenKislargeandtheframelengthissmall.Anotherwaytogettheboundarydistributioninformationwasproposedin[5].Itsuggeststousetheforwardandbackwardvariablescomputedinthepreviousiterationtoprovideanapproximationtotheboundary.Simulationresultsshowthatitsperformanceisveryclosetotheserialdecodingschemein[2].Thecostofthismethodissomeadditionalmemoryneededtostoretheintermediateboundaryinformation.

Toreducethesizeoftheadditionalmemoryin[5],wesuggestthatinsteadofthewholeboundarydistributioninformation,onlytheindexofthemostprobablestatebestored.Withthissimplifiedstoragescheme,wecansavealotofmemoryatthecostofsomeacceptableperformanceloss.

Therestofthepaperisorganizedasfollows.First,section2givesabriefdescriptionofthestandardserialdecodingtoshowitsweakpointindecodingspeed.Section3showsthetwoparalleldecodingmethodsproposedin[4-5].Ourproposedschemeisgiveninsection4andsection5isthesimulationresults.Finally,section6concludesthepaper.

SerialdecodingschemeofTurbocodes

TurbocodesintroducedbyBerrouetal[2]haveanear-Shannoncapability,thusbeingveryattractivetovariousapplications.However,thelargedecodingdelayprohibitsitsapplications,especiallyforthosetimeconstraintorhighdatarateservices.

ThedecoderofTurbocodesemploysaseriallyconcatenatedstructure,whichconsistsoftwocomponentdecoders,asshowninFig.1.Thedecodingproceedsinaniterativemanner.Ineachiteration,thefirstcomponentdecoder,correspondingtothefirstcomponentencoder,deliversitscalculatedextrinsicinformation(Le1)tothe2ndcomponentdecoderafterinterleavingtoserveastheapriori

information(La2).Thenthesecondcomponentdecoderfeedbacksitsextrinsicinformation(Le2)tothefirstoneafterde-interleaving,workingasaprioriinformation(La1).xandxinFig.1are

receivedinformationbitsanditsinterleavedversion,respectively.The

y1and

y2arereceived

redundantbitsproducedbythecomponentRecursiveSystematicConvolutional(RSC)encoders,respectively.

La1x

Le1

La2

Le2

LLRs

Figure1.DecoderofTurboCodes

ThefunctionofthecomponentdecoderistocalculatetheLLRs(LogarithmofLikelihoodRatio)fortheinformationbits.ThecalculationofLLRsgenerallyemploystheMAPalgorithm[3]wheretheLLRsfortheithinformationbitofalengthNframecanbeexpressedas

i(s)i(s)i(s,1)

LLRiLog

i(s)i(s)i(s,0)

(1)

wherethesummationinnumeratoranddenominatorisoverthesetofbranchesintrellisdiagram,inwhichtheinformationbitis1or0,respectively.NotethatwearenotgoingtoexplainthedetailsoftheMAPalgorithmoritsvariants;rather,wesimplyshowtherecursivenatureinthealgorithmsthat

prohibitthedecodingspeed.Therearethreekindsofvariablesin(1).

i(s)

and

i(s)

denotethe

forwardandbackwardvariablesintheithtrellisstage,

is,d

isthebranchmetriccorresponding

thebranchwhichstartsfromstatesandtheinformationbitisd.

Thei

in(1)canbedirectlycalculatedwithchannelinputsandtheaprioriinputswithinoneclock

cycle,providedthatalltheinputsnecessaryarereadytouse.Itistherequiresrecursivecalculationasshownin(2)and(3)forbinaryRSC:

i(s)

and

i(s)

which

i(s)i1(s1)i(s1,1)i1(s0)i(s0,0),

i1,2...N1

(2)

i(s)i1(s)i1(s,1)i1(s)i1(s,0),

iN1,N2...1

(3)

wheresd

isthepreviousstatecomingtostateswiththeinputinformationbitsasd.

isthe

arrivingstateviathecurrentstateswithinputbitasd.Theinitialconditionfortherecursivecalculationis

(s)1

else

(4)

(5)

N else

Notethattheinitialcondition(5)onlyappliesfortheRSCwhichisterminated[6].IfRSCisopen[2],theinitialconditionshouldbe

N(s)2m, s

(6)

where2misthenumberofthetrellisstates.ThewholeprocessisshowninFig.2.

LLRs

Figure2.theoperationflowchartforMAPalgorithm

Apparently,tocalculatei,weshouldcalculate

first.Andtocalculate

N1,wehavetowait

forallpastibeenobtainedandthesituationissimilartoi.Thus,theprocessingforone

componentdecoderinoneiterationcannotbefinishedwithinNclockcycles(someextraclock

cyclesmayberequiredtoexecutesomeotheroperations).Iftheclockfrequencyis

fc,thenthedelay

forprocessingoftheMAPismorethan

Tmap

.Supposethatthemaximumiterationis

Imax,then

thedelayfordecodingonecodewordofturbocodeismorethanT

2T I

2NI

,leadsto

dec map

max

ainformationthroughputlessthan

RNTdec

2Imax

.Forexample,when

fc100MHz

and

Imax8,thethroughputcannotbelargerthan6.25Mbps.Notethatwithsomeearly-stopiterationstrategy[7],theactualnumberofiterationstosuccessfullydecodeonecodewordmaybemuchless

than

Imax,butthehardwarehavetobedesignedwith

Imax.

Paralleldecodingmethods

Toreducethedecodingdelayandimproveitsthroughput,wecandividethewholecodewordintomultiplesub-blocksanddecodethemwithmultipleMAPprocessorsinparallel.However,thesegmentsinthemiddlehavenoinitialcondition[4-5].Withrandomorarbitraryinitializationas(6)willleadtoaperformancelossthatisunacceptable.Sodirectlyworkingateachsegmentfacesaproblemofhowto

initializetheforwardandbackwardvariablesattheboundaryofeachsegment.Asshownin(2),

dependsonallpast

js,jk.Notethatthefarawaythejleavesk,thelesseffect

will

j(s)

haveon

k(s).Thus,tohaveagoodaccuracyforthe

k(s)

atthebeginningofeach

sub-block,weshouldstarttheprocessingof(2)earlieratjkL

withLlargeenough[4].This

isshowninFig.3.Foracertainsub-blockoflengthMlocatingfrompositionktokM1,we

startthecalculationfromkL

with

kLs

beingarbitrarilyinitializedas(6).Theposition

tok1

belongstotheprevioussub-block,hencetheparallelprocessersarepartially

overlapped.Thecalculationof

k(s)

isexactlythesame.Tohaveagoodaccuracyforthe

kM1(s)

attheright-handbeginningpointofthesegment,weshouldstartfromjkM1L.

overlappingarea

kM(s)

…

kL(s)

…

k1(s)

k(s)

…

kM1(s)

Figure3.theoverlappingmethodin[4]

Therefore,toproducealltheforwardvariables

i(s)

inonecomponentdecoderwithinoneiteration,

thewholeprocessingofthecodewordwithKsegmentsandLoverlappingrequiresNL

clock

cyclesinsteadofN cycles,andthedecodingspeedwillbesloweddownbyafactorof N .

K NLK

LargertheLandKis,slowerthedecodingspeedwillbe.Forinstance,assumingthattheblocklengthis2298[8],ifwesegmentitinto50sub-blocks,andtheoverlappinglengthis30,thenitrequires76clockcyclestoproducealltheforwardvariables,insteadof46cycles.Hence,theMAPdecodingspeedcanonlybeimprovedby30times,farlessthanK50,whichisoriginallysupposedtobe.

Tosolvesuchproblem,amethodwithstoragefeaturewasproposedin[5].Insteadofoverlappingtechnique[4],itutilizestheforwardandbackwardvariablescomputedinthepreviousiterationtogettheboundaryinformationforeachsub-block.Forexample,forthep+1thiterationandthesame

sub-blocklocatingfromkto

kM1,thecalculationof(2)willstartatkwithinitialcondition

p1s

beapproximatedwith

ps,thefinalresultsinlastiteration.ThisisshowninFig.4.

(s)

(p)

Thebackwardvariableisprocessedinthesameway.

(p)(s)

(p1)(s)

kM1

Figure4.themethodin[5]

Comparedwiththeoverlappingmethod,suchmethodrequiresnoredundantcomputations.Hence,thedecodingspeedcanincreaselinearlywiththenumberofsegments.However,itrequiresanextra

memoryofsize2m(K1)v

tostorethefinalresultsof

inlastiteration,wherevisthe

numberofbitsusedtoquantizetheps.

Proposedmethod

Forsomeapplicationssuchassmallmobileterminals,itispreferredthatthememoryrequiredshouldbeassmallaspossible.Tokeepthefeatureofthemethodsin[5]butsavethememory,inthispaperweproposedtostoretheindexofthemostprobablestateinsteadofalltherealvaluesofintermediate

boundary(e.g.

(p)(s)inFig.4),computedinthepreviousiteration.Withthestatenumberof2m,

weonlyneedtostorembits,ratherthan2mvbits.Thismaybealargesavingifmandvislarge.

Atp+1thiteration,forthesamesub-blocklocatingfromkto

kM1,thecalculationof(2)will

startatkandtheinitialconditionissetas(inlogarithmdomain)

logp1s 0

ss*

(7)

else

whereisapredeterminedparameterwhichonlydependsonthenumberofiterationsandcanbe

optimizedviasimulation,s*argmaxlogpsistheindexofthemostprobablestateinthe

previousiteration.Withthismethod,theonlythingweneedtostoredfornextiterationisrequiresmbits.

Simulationresults

s*,which

Inthissectionwewillshowtheperformanceoftheproposedmethod.Thesimulationadoptstheturbocodedefinedincdma2000[9],theinterleaverlengthischosenasN=2298,themaximumiteration

numberis

Imax8.Thenumberofsub-blockis

K20

implyingthatthehardwareMAPprocesser

consistsof20parallelprocessorsandeachprocesses

2298/20115

bits.Bothlogmapand

max-logmapdecodingalgorithmsareconsideredandFER(frameerrorrate)andBER(biterrorrate)areobserved.

Figure5.FERperformanceofcoderate=1/3

Fig.5includesfourFERperformancecurvesforcoderate1/3,usingmax-logmaporlogmapdecodingalgorithmrespectively.AtFER=0.01,formax-logmapalgorithm,theperformancelossoftheproposedmethodisabout0.07dB,comparedwiththemethodin[5].Andforlogmap,theperformancelossatFER=0.01isabout0.1dB.

Figure6.FERperformanceofcoderate=3/4

InFig.6,wecanseethatforcoderate3/4,theperformancelossislargerthanthatofcoderate1/3.AtFFR=0.01,theperformancelossisabout0.3dBforbothlogmapandmax-logmapdecodingalgorithm.WehavealsothesimulatedBERperformancewhicharenotshownhereduetolimitedspace.TheconclusionforBERperformanceisroughlythesame.

Conclusion

Inthispaper,wesimplifiedthemethodin[5],byonlystoringtheindexofthemostprobablestate,insteadofallvaluesofintermediateboundary.Theproposedschemecanreducethesizeofmemorylargely,withsomeacceptableFERorBERperformanceloss.

References

“Throughput,latencyandcomplexityestimationfortheturbodecoder,”3GPPTSGRANWG1#46,R1-062155,

Tallinn,Estonia,Aug

人人文库> 全部分类> 图纸下载 > 毕业设计

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

一种简化的Turbo码并行译码方法

文档简介

温馨提示

最新文档

评论

一种简化的Turbo码并行译码方法

文档简介

温馨提示

最新文档

评论

相关文档