LecNote-8-并行计算中的通信:同步与数据传输_第1页
LecNote-8-并行计算中的通信:同步与数据传输_第2页
LecNote-8-并行计算中的通信:同步与数据传输_第3页
LecNote-8-并行计算中的通信:同步与数据传输_第4页
LecNote-8-并行计算中的通信:同步与数据传输_第5页
已阅读5页,还剩30页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

第八讲并行计算中的通信:同步与数据传输并行计算中的通信:communication消息交换:messageexchanging数据传输:datatransmittingCELLBE的消息传输机制:邮箱MailboxCELLBE的数据传输机制:DMA并行计算中的通信并行计算:多个处理器/执行内核协作,共同完成一个问题的求解,每个处理器/执行内核执行其中的一个子任务处理器/执行内核在完成一个子任务时,需要解决算法设计方面的问题子任务是什么?它涉及哪些数据、有多大的规模这个子任务与整个问题、其它子任务的关系是什么样的?算法实现方面的问题所涉及数据的存储空间能够被直接访问吗?如果不能够,如何将数据传输到处理器/执行内核能够直接访问的存储空间?以求素数问题为例,子任务I:计算[baseScope+chunkSize*IbaseScope+chunkSize*(I+1))间的素数已经求得[2baseScope]范围内的素数,存储在primes[]中数组primes[]中,第I个子任务的结果要存储在第I+1个子任务的结果之前于是,需要解决同步问题:每个子任务分别交给一个处理器/执行内核子任务的描述问题:primes[]的地址、baseScope、chunkSize、第I个子任务的结果在primes[]中起始下标数据访问问题:primes[]的元素访问并行计算对通信的需求同步:任务分配,实现负载平衡;满足进度要求(数据相关),保证结果唯一、正确求素数问题:子任务I+1需要子任务I的结果(求得的素数总数),确定自己结果在primes[]中的下标其始位置被交换数据(规模、存储地址)的描述被交换的数据值问题数据子集1数据子集2数据子集1处理器/执行内核1的直接存储空间数据子集2处理器/执行内核2的直接存储空间同步:谁做子任务1、子任务2;谁可以读写问题空间的数据;何时可以开始做了源地址、目的地址、规模源地址、目的地址、规模并行计算中通信的分类实现并行计算的算法逻辑子任务的同步信息子任务的描述信息实现并行计算的问题域数据存储与访问通信在共享存储并行计算机中的实现共享存储并行计算机:基于Cache的multi-core处理器、SMP、基于软件的NUMA并行计算机(COW)等编程模型:POSIX线程库、OpenMP等通信的实现实现并行计算的算法逻辑子任务的同步信息,提供互斥锁机制,实现临界区、保证操作的原子性信号量机制,实现数据依赖关系、保证结果的确定性子任务的描述信息:提供全局存储空间,用专门的变量存储,所有处理器/执行内核均可直接访问实现并行计算的问题域数据存储与访问:提供全局存储空间,所有处理器/执行内核均可直接访问通信在CELLBE中的实现:MFC实现并行计算的算法逻辑:邮箱Mailbox子任务的同步信息子任务的描述信息实现并行计算的问题域数据存储与访问:DMA数据存取CELLBE中的存储结构Threetypesofstoragedomainsmain-storagedomain,8SPElocalstoredomains8SPEchanneldomainsThemain-storagedomain,whichistheentireeffective-addressspace,canbeconfiguredbythePPEoperatingsystemtobesharedbyallprocessorsandmemory-mappeddevicesinthesystem(allI/Oismemory-mapped)Local-storageandchannelproblem-state(user-state)domainsareprivatetotheSPU,LS,andMFCofeachSPEMFCCommandsMainmechanismforSPUstoaccessmainstorage(DMAcommands)maintainsynchronization

withotherprocessorsanddevicesinthesystem(Synchronizationcommands)CanbeissuedeitherSPUviaitsMFCbyPPEorotherdevice,asfollows:CoderunningontheSPUissuesanMFCcommandbyexecutingaseriesofwritesand/orreadsusingchannelinstructions-readchannel(rdch),writechannel(wrch),andreadchannelcount(rchcnt).CoderunningonthePPEorotherdevicesissuesanMFCcommandbyperformingaseriesofstoresand/orloadsto

memory-mappedI/O

(MMIO)registersintheMFCMFCcommandsarequeuedinoneoftwoindependentMFCcommandqueues:MFCSPUCommandQueue—Forchannel-initiatedcommandsbytheassociatedSPUMFCProxyCommandQueue—ForMMIO-initiatedcommandsbythePPEorotherdeviceCommunicationBetweenthePPEandSPEsThreeprimarycommunicationmechanismsbetweenthePPEandSPEsMailboxesQueuesforexchanging32-bitmessagesTwomailboxes(theSPUWriteOutboundMailboxandtheSPUWriteOutboundInterruptMailbox)areprovidedforsendingmessagesfromtheSPEtothePPEOnemailbox(theSPUReadInboundMailbox)isprovidedforsendingmessagestotheSPESignalnotificationregistersEachSPEhastwo32-bitsignal-notificationregisters,eachhasacorrespondingmemory-mappedI/O(MMIO)registerintowhichthesignal-notificationdataiswrittenbythesendingprocessorSignal-notificationchannels,orsignals,areinbound(toanSPE)registersTheycanbeusedbyotherSPEs,thePPE,orotherdevicestosendinformation,suchasabuffer-completionsynchronizationflag,toanSPEDMAsTotransferdatabetweenmainstorageandtheLSPPEcommunicateswithSPEsthroughMMIOregisterssupportedbytheMFCofeachSPESoftwareontheSPE’sSPUinteractswiththeMFCthroughchannels,whichenqueueDMAcommandsandprovideotherfacilities,suchasmailboxes,signalnotification,andaccessauxiliaryresourcesMFCDMACommandsMFCcommandsthattransferdataarereferredtoasDMAcommandsAnSPEorPPEperformsdatatransfersbetweentheSPE’sLSandmainstorageprimarilyusingDMAtransferscontrolledbytheMFCDMAcontrollerforthatSPEEachMFCcanmaintainandprocessmultiplein-progressDMAcommandrequestsandDMAtransfersTheMFCcanautonomouslymanageasequenceofDMAtransfersinresponsetoaDMA-listcommandMemory-mappedmailboxesoratomicMFCsynchronizationcommandscanbeusedforsynchronizationandmutualexclusion.DMAtransferrequestscontainbothanlocalstoreaddressandaneffectiveaddressThus,theycanaddressbothanSPE’sLSandmainstorageandtherebyinitiateDMAtransfersbetweenthedomainsData-transferdirectionforDMAcommandsisalwaysreferencedfromtheperspectiveofanSPEget:transferdataintoanSPE(frommainstoragetolocalstore)put:transferdataoutofanSPE(fromlocalstoretomainstorage)PPE和SPE要使用不同的API接口来实现DMA传输和DMA列表传输。PPE使用libspe

提供的API接口SPE有两种API接口可以使用MFC应用API接口DMA复合内控指令API接口SPEProxy命令API接口使用者:PPE这些函数是由libspe

提供的,它提供了PPE发起的DMA传输功能,它利用SPEMFCProxy命令发送机制。PPE线程可以使用这些函数访问SPE的本地存储spe_mfcio_get()spe_mfcio_getb()spe_mfcio_getf()spe_mfcio_put()spe_mfcio_putb()spe_mfcio_putf()要使用这些API,必须包含头文件libspe2.h注意这些API名字是基于SPE为中心的视角而命名的put意味着从SPE到系统内存的数据传输get则是相反方向的传输MFC应用API接口这些函数是为了编程方便而提供的,不是必需的。这些函数通过宏定义或者编译器内置函数方式实现。供SPE发起DMA数据传输时使用DMA传输APImfc_put()mfc_putb()mfc_putf()mfc_get()mfc_getb()mfc_getf()DMA列表传输APImfc_putl()mfc_putlb()mfc_putlf()mfc_getl()mfc_getlb()mfc_getlf()要使用这些API,必须包含头文件spu_mfcio.h注意这些API名字是基于SPE为中心的视角而命名的put意味着从SPE到系统内存的数据传输get则是相反方向的传输DMA复合内控指令API接口使用者:SPE这些DMA复合内控指令是由一系列底层内控指令构成,主要使用了通道控制的内控指令。不需要头文件spu_mfcdma32()spu_mfcdma64()DMAExample:ReadintoLocalStoreinlinevoiddma_mem_to_ls(unsigned

int

mem_addr,volatilevoid*ls_addr,unsignedintsize){unsignedinttag=0;unsignedintmask=1;mfc_get(ls_addr,mem_addr,size,tag,0,0);mfc_write_tag_mask(mask);mfc_read_tag_status_all();}Readcontentsofmem_addrintols_addrSettagmaskWaitforalltagDMAcompletedDMAExample:WritetoMainMemoryinlinevoiddma_ls_to_mem(unsigned

int

mem_addr,volatilevoid*ls_addr,unsignedintsize){unsignedinttag=0;unsignedintmask=1;mfc_put(ls_addr,mem_addr,size,tag,0,0);mfc_write_tag_mask(mask);mfc_read_tag_status_all();}Writecontentsofls_addrintomem_addrSettagmaskWaitforalltagDMAcompletedMFCDMA语义:以SPU端的mfc_get、mfc_put为例DMAgetfrommainmemoryintolocalstore(void)mfc_get(volatilevoid*ls,uint64_tea,uint32_tsize,uint32_ttag,uint32_ttid,uint32_trid)DMAputintomainmemoryfromlocalstore(void)mfc_put(volatilevoid*ls,uint64_tea,uint32_tsize,uint32_ttag,uint32_ttid,uint32_trid)ls=targetaddressinSPUlocalstoreforfetcheddata(SPUlocaladdress)ea=effectiveaddressfromwhichdataisfetched(globaladdress)size=transfersizeinbytestag=tag-groupidentifiertid=transfer-classidrid=replacement-classidDMAreadandwritecommandsarenon-blockingReplacementClassIDandTransferClassIDtransfer-classid:allowsapplicationsoftwaretoinfluencetheallocationofbusbandwidthforanMFCcommandreplacement-classid:allowsprivilegedsoftwaretoinfluenceL2-cacheandtranslationlookasidebuffer(TLB)replacementforcachemissescausedbytheMFCcommand.ThedefaultclassID(‘0’)isusedforallundefinedorinvalidclassIDs.AninvalidclassIDdoesnotgenerateanexception.DMA-CommandTagGroupsEachDMAcommandhasa5-bittagcommandswithsametagvalueforma“taggroup”Tagmaskisusedtoidentifytaggroupsforstatuschecks:a32-bitword,eachbitinthetagmaskcorrespondstoaspecifictagidtag_mask=(1<<tag_id)Tags,taggroups,andtagmasksusedfor:checkingstatusofDMAcommandswaitingforcompletionofDMAcommandsTaggingisoptionalbutcanbeusefulwhenusingbarrierstocontroltheorderingofMFCcommandswithinasinglecommandqueue.SynchronizationofDMAcommandswithinataggroup:fenceandbarrierExecutionofafencedcommandoptionisdelayeduntilallpreviouslyissuedcommandswithinthesametaggrouphavebeenperformed.Executionofabarriercommandoptionandallsubsequentcommandsisdelayeduntilallpreviouslyissuedcommandsinthesametaggrouphavebeenperformed.barrierandfence在同一个命令组中,barrier命令会等待所有barrier命令之前发起的命令完成之后,再执行命令组中后继的命令。在同一个命令组中,一个fence命令保证在此命令之前发起的命令执行完成,但fence命令之后的命令也可能在fence命令之前完成barrier之后的命令等待barrier之前的所有命令完成之后再开始执行fence命令前面的所有命令都已完成,而且后面的两个命令在也在fence命令之前完成了ToensureorderofDMArequestexecution:mfc_putf:fenced(allcommandsexecutedbeforewithinthesametaggroupmustfinishfirst,lateronescouldbebefore)mfc_putb:barrier(thebarriercommandandallcommandsissuedthereafterarenotexecuteduntilallpreviouslyissuedcommandsinthesametaggrouphavebeenperformed)DMACommandStatus(SPE)Settagmaskunsignedint

tag_mask;mfc_write_tag_mask(tag_mask);tagmaskremainssetuntilchangedFetchtagstatusunsignedintresult;result=mfc_read_tag_status();/*ormfc_stat_tag_status();*/tagstatusislogicallyANDedwithcurrenttagmasktagstatusbitof‘1’indicatesthatnoDMArequeststaggedwiththespecifictagid(correspondingtothestatusbitlocation)arestilleitherinprogressorintheDMAqueueWaitingforDMACompletion(SPE)WaitforanytaggedDMA:mfc_read_tag_status_any():waituntilanyofthespecifiedtaggedDMAcommandsiscompletedWaitforalltaggedDMA:mfc_read_tag_status_all():waituntilallofthespecifiedtaggedDMAcommandsarecompletedSpecifiedtaggedDMAcommands=commandspecifiedbycurrenttagmasksettingDMACharacteristicsDMAtransfers:TheMFCsupportsnaturallyalignedtransfersizesof1,2,4,or8bytes,andmultiplesof16-bytes,withamaximumtransfersizeof16KBtransfersizescanbe1,2,4,8,andn*16bytes(ninteger)maximumis16KBperDMAtransfer128Balignmentispreferable:PeakperformancecanbeachievedwhenboththeEAandLSAare128-bytealignedandthesizeofthetransferisanevenmultipleof128bytesDMAcommandqueuesperSPU16-elementqueueforSPU-initiatedrequests8-elementqueueforPPE-initiatedrequestsSPU-initiatedDMAisalwayspreferableDMAtags:EachDMAcommandistaggedwitha5-bitTagGroupID.ThisidentifierisusedtocheckorwaitonthecompletionofallqueuedcommandsinoneormoretaggroupseachDMAcommandistaggedwitha5-bitidentifiersameidentifiercanbeusedformultiplecommandstagsusedforpollingstatusorwaitingoncompletionofDMAcommandsDMAlistsasingleDMAcommandcancauseexecutionofalistoftransferrequests(inLS)listsimplementscatter-gatherfunctionsalistcancontainupto2KtransferrequestsMailboxesTocommunicatemessagesupto32bitsinlength,suchasbuffercompletionflagsorprogramstatuse.g.,WhentheSPEplacescomputationalresultsinmainstorageviaDMA.AfterrequestingtheDMAtransfer,theSPEwaitsfortheDMAtransfertocompleteandthenwritestoanoutboundmailboxtonotifythePPEthatitscomputationiscompleteCanbeusedforanyshort-datatransferpurpose,suchassendingofstorageaddresses,functionparameters,commandparameters,andstate-machineparametersCanalsobeusedforcommunicationbetweenanSPEandotherSPEs,processors,ordevicesPrivilegedsoftwareneedstoallowoneSPEtoaccessthemailboxregisterinanotherSPEbymappingthetargetSPE’sproblem-stateareaintotheEAspaceofthesourceSPE.Ifsoftwaredoesnotallowthis,thenonlyatomicoperationsandsignalnotificationsareavailableforSPE-to-SPEcommunication.EachMFCprovidesthreemailboxqueuesPPE(“SPUwriteoutbound”)mailboxqueueSPEwrites,PPEreads1deepSPEstallswritingtofullmailboxPPE(“SPUwriteoutbound”)interruptmailboxqueuelikePPEmailboxqueue,butaninterruptispostedtothePPEwhenthemailboxiswrittenSPU(“SPUreadinbound”)mailboxqueuePPEwrites,SPEreads4deepcanbeoverwrittenEachmailboxentryisafullword:32bitPPEAccesstoMailboxesPPEcanderive“addresses”ofmailboxesfromspethreadidPPEmailboxcallsusespe_idtoidentifydesiredSPE’smailboxChannels:SPE端的访问接口SPE(outgoing)writethe32-bitmessagevaluetoeitheritstwooutboundmailboxchannelsSPE(incoming)readsamessageintheinboundmailboxMMIORegisters:PPE端的访问接口PPEandotherdevices(incoming)readmessageinoutboundmailboxbyreadingtheMMIOregisterintheSPE’sMFCPPEandotherdevices(outgoing)sendbywritingtheassociatedMMIOregisterForinterruptsassociatedwiththeSPUWriteOutboundInterruptMailbox,noorderingoftheinterruptandpreviouslyissuedMFCcommandsMailbox的访问MailboxesAPI–libspe2SPUWriteOutboundMailboxSPEwrite将数据信息data发送到SPU_WrOutMbox,如果SPU_WrOutMbox已满,指令被阻塞voidspu_write_out_mbox(unsignedintdata)返回SPU_WrOutMbox可接收信息数据的数目,如果返回0,表示SPU_WrOutMbox已满unsignedint

spu_stat_out_mbox(void)PPEread从spe_id

指定的SPE的SPU_WrOutMbox中读出可用信息数目。如该SPU_WrOutMbox为空,返回数值0。返回值如果为非零正整数,表明邮箱中未读出的信息个数int

spe_out_mbox_status(spe_context_ptr_t

spe_id)从spe_id

指定的SPE的SPU_WrOutMbox中读出至多count条可用信息,结果放在mbox_data

指向的缓冲区中,返回值是实际读出的信息个数。该函数不会阻塞。如果可用的信息数目少于count,只将可用的信息读出。

int

spe_out_mbox_read(spe_context_ptr_t

spe_id,unsignedint*mbox_data,intcount)Thevaluewritten

totheSPUWriteOutboundMailboxchannelSPU_WrOutMboxisenteredintotheoutboundmailboxintheMFCifthemailboxhascapacitytoacceptthevalue.Ifthemailboxcanaccept

thevalue,thechannelcountforSPU_WrOutMboxisdecremented

by‘1’.Iftheoutboundmailboxisfull,thechannelcountwillreadas‘0’.IfSPEsoftwarewritesavaluetoSPU_WrOutMboxwhenthechannelcountis‘0’,theSPUwillstall

onthewrite.TheSPUremainsstalled

untilthePPEorotherdevicereadsamessagefromtheoutboundmailboxbyreadingtheMMIOaddressofthemailbox.Whenthemailboxisread

throughtheMMIOaddress,thechannelcountisincrementedby‘1’.SPUWriteOutboundInterruptMailboxSPEwrite将数据信息data发送到SPU_WrOutIntrMbox,如果SPU_WrOutIntrMbox已满,指令被阻塞voidspu_write_out_intr_mbox(unsignedintdata)返回SPU_WrOutIntrMbox可接收信息数据的数目,如果返回0,表示SPU_WrOutIntrMbox已满unsignedint

spu_stat_out_intr_mbox(void)PPEread从spe_id指定的SPE的SPU_WrOutIntrMbox中读出至多count条可用信息,结果放在mbox_data指向的缓冲区中,返回值是实际读出的信息个数。由behavior指定该函数是否是阻塞的int

spe_out_intr_mbox_read(spe_context_ptr_t

spe_id,unsignedint

*mbox_data,intcount,

unsignedintbehavior)Possiblevaluesforbehaviorare:SPE_MBOX_ALL_BLOCKING:Thecallblocksuntilallcountmailboxmessageshavebeenread.SPE_MBOX_ANY_BLOCKING:Thecallblocksuntilatleastonemailboxmessagehasbeenread.SPE_MBOX_ANY_NONBLOCKING:Thecallreadsasmanymailboxmessagesaspossibleuptoamaximumofcountwithoutblocking.从spe_id指定的SPE的SPU_WrOutIntrMbox中读出可用信息数目。如果该SPU_WrOutIntrMbox为空,返回数值0。返回值如果为非零正整数,表明邮箱中未读出的信息个数int

spe_out_intr_mbox_status(spe_context_ptr_t

spe_id)Thevaluewritten

totheSPUWriteOutboundInterruptMailboxchannel(SPU_WrOutIntrMbox)isenteredintotheoutboundinterruptmailboxifthemailboxhascapacitytoacceptthevalue.Ifthemailboxcanaccept

themessage,thechannelcountforSPU_WrOutIntrMboxisdecrementedby‘1’,andaninterruptisraised

inthePPEorotherdevice,dependingoninterruptenablingandrouting.ThereisnoorderingoftheinterruptandpreviouslyissuedMFCcommands.Iftheoutboundinterruptmailboxisfull,thechannelcountwillreadas‘0’.IfSPEsoftwarewritesavaluetoSPU_WrOutIntrMboxwhenthechannelcountis‘0’,theSPUwillstall

onthewrite.TheSPUremainsstalled

untilthePPEorotherdevicereadsamailboxmessagefromtheoutboundinterruptmailboxbyreadingtheMMIOaddressofthemailbox.Whenthisisdone,thechannelcountisincremented

by‘1’.SPUReadInboundMailboxPPEwrite向spe_id指定的SPE的SPU_RdInMbox写入最多count个信息,mbox_data

指向数据源,由behavior指定该函数是否是阻塞的int

spe_in_mbox_write(spe_context_ptr_t

spe_id,unsignedint*mbox_data,intcount,unsignedintbehavior)

PossiblevaluesforbehaviorareSPE_MBOX_ALL_BLOCKING:ThecallblocksuntilallcountmailboxmessageshavebeenwrittenSPE_MBOX_ANY_BLOCKING:ThecallblocksuntilatleastonemailboxmessagehasbeenwrittenSPE_MBOX_ANY_NONBLOCKING:Thecallwritesasmanymailboxmessagesaspossibleuptoamaximumofcountwithoutblocking从spe_id指定的SPE的SPU_RdInMbox中读出可写入的信息数目。如果SPU_RdInMbox已满,返回值为0。返回值如果为非零正整数,表明邮箱中可以写入的信息个数int

spe_in_mbox_status(spe_context_ptr_t

spe_id)SPEread从SPU_RdInMbox中读出下一个数据信息,如果邮箱队列为空,该指令被阻塞unsignedint

spu_read_in_mbox(void)返回SPU_RdInMbox中有效信息的数目,如果返回值为非零正整数,则SPU_RdInMbox中包含未被读出的数据信息unsignedint

spu_stat_in_mbox(void)MailboxisFIFOqueueIftheSPUReadInboundMailboxchannel(SPU_RdInMbox)hasamessage,thevaluereadfromthemailboxistheoldestmessagewrittentothemailbox.MailboxStatus(empty:channelcount=0)If

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论