并行程序设计中文课件 09 并行程序通信开销

上传人：o*** IP属地：未知上传时间：2024-12-31 格式：PPT 页数：68 大小：2MB 积分：6 举报 版权申诉

已阅读5页，还剩63页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

ParallelProgrammingInstructor:ZhangWeizhe(张伟哲)ComputerNetworkandInformationSecurityTechniqueResearchCenter,SchoolofComputerScienceandTechnology,HarbinInstituteofTechnologyCommunicationCostsinParallelComputers31.CommunicationCostModel2.

CommunicationCostsofDifferentTopology

2.1Point-to-PointCommunication2.2One-to-AllBroadcast&All-to-OneReduction2.3All-to-AllBroadcastandReduction

2.4ThePrefix-SumOperation2.5ScatterandGather2.6All-to-AllPersonalizedCommunication2.7CircularShift3.

SummaryOutline41.沟通成本模型2.不同拓扑的通信成本2.1点对点通信2.2一对多广播和多对一减少2.3全方位广播与减少2.4前缀和操作2.5散点和收集2.6全对所有个性化沟通2.7循环3.总结OutlineCommunicationCostsAlongwithidlingandcontention,communicationisamajoroverheadinparallelprograms.Thecostofcommunicationisdependentonavarietyoffeaturesincludingtheprogrammingmodelsemantics,thenetworktopology,datahandlingandrouting,andassociatedsoftwareprotocols.5通信开销随着空转和争用，通信是并行程序中的一大开销。通信的成本取决于各种功能，包括编程模型语义，网络拓扑，数据处理和路由以及相关的软件协议。6Thetotaltimetotransferamessageoveranetworkcomprisesofthefollowing:Startuptime(ts):Timespentatsendingandreceivingnodes(executingtheroutingalgorithm,programmingrouters,etc.).Per-hoptime(th):Thistimeisafunctionofnumberofhopsandincludesfactorssuchasswitchlatencies,networkdelays,etc.Per-wordtransfertime

(tw):Thistimeincludesalloverheadsthataredeterminedbythelengthofthemessage.Thisincludesbandwidthoflinks,errorcheckingandcorrection,etc.MessagePassingCosts7通过网络传输消息的总时间包括以下内容：启动时间（ts）：发送和接收节点所花费的时间（执行路由算法，编程路由器等）。每跳时间（th）：此时间是跳数的函数，包括切换延迟，网络延迟等因素。每字传输时间（tw）：此时间包括由消息长度确定的所有开销。这包括链路的带宽，错误检查和纠正等。消息传递成本8Store-and-ForwardRoutingAmessagetraversingmultiplehopsiscompletelyreceivedatanintermediatehopbeforebeingforwardedtothenexthop.ThetotalcommunicationcostforamessageofsizemwordstotraverselcommunicationlinksisInmostplatforms,thissmallandtheaboveexpressioncanbeapproximatedby

9存储转发路由在被转发到下一个跳转之前，在一个中间的跳转中，一个消息遍历多个跳转。一个长度为m字的消息通过l的通信链接的总通信成本是在大多数平台中，th是很小的，上面的表达式可以被近似10Cut-ThroughRoutingStore-and-forwardmakespooruseofcommunicationresources.Packetroutingbreaksmessagesintopacketsandpipelinesthemthroughthenetwork.Atracermessagefirstprogramsallintermediaterouters.Allflitsthentakethesameroute.Thetotalcommunicationtimeforcut-throughroutingisapproximatedby:11直通路由存储转发不利于通信资源。分组路由将消息分解成分组，并通过网络将它们进行管道传输。示踪消息首先对所有中间路由器进行编程。所有的fl子然后采取相同的路线。直通路由的总通信时间近似为：12SimplifiedCostModelThecostofcommunicatingamessagebetweentwonodeslhopsawayusingcut-throughroutingisgivenbyInthisexpression,thistypicallysmallerthantsandtw.Forthisreason,thesecondtermintheRHSdoesnotshow,particularly,whenmislarge.Forthesereasons,wecanapproximatethecostofmessagetransferby13简化成本模型在两个节点之间传递消息的成本l由跳过路由跳过在这个表达式中，th通常小于ts和tw。因此，RHS中的第二项并不显示，特别是当m大时。由于这些原因，我们可以近似消息传输的成本14151.CommunicationCostModel2.CommunicationCostsofDifferentTopology

2.1Point-to-PointCommunication2.2One-to-AllBroadcast&All-to-OneReduction2.3All-to-AllBroadcastandReduction

2.4ThePrefix-SumOperation2.5ScatterandGather2.6All-to-AllPersonalizedCommunication2.7CircularShift3.

SummaryOutline

DataCommunicationbetweenTwoNetworkProcessors

拓扑Store-and-ForwardRouting

Cutting-ThroughRingts+mtwGrid—torusts+mtwHypercubets+mtwBasicCommunicationOperations:IntroductionGroupcommunicationoperationsarebuiltusingpoint-to-pointmessagingprimitives.Recallfromourdiscussionofarchitecturesthatcommunicatingamessageofsizemoveranuncongestednetworktakestimets+tmw.Weusethisasthebasisforouranalyses.Wherenecessary,wetakecongestionintoaccountexplicitlybyscalingthetwterm.Weassumethatthenetworkisbidirectionalandthatcommunicationissingle-ported.17基本沟通操作：介绍组通信操作是使用点对点消息传递原语构建的。从我们对架构的讨论回想一下，在一个不成功的网络上传送大小为m的消息需要时间ts+tmw。我们用它作为我们分析的基础。如有必要，我们通过缩放tw项目来明确拥挤。我们假设网络是双向的，通信是单端口的。18One-to-AllBroadcastandAll-to-OneReductionOneprocessorhasapieceofdata(ofsizem)itneedstosendtoeveryone.Thedualofone-to-allbroadcastisall-to-onereduction.Inall-to-onereduction,eachprocessorhasmunitsofdata.Thesedataitemsmustbecombinedpiece-wise(usingsomeassociativeoperator,suchasadditionormin),andtheresultmadeavailableatatargetprocessor19One-to-AllBroadcastandAll-to-OneReduction一个处理器有一个数据（大小m）需要发送给每个人。一对多广播的双重缩减是多对一的。在多对一的减少中，每个处理器都有m个数据单元。这些数据项必须以分段方式组合（使用一些关联运算符，例如加法或最小），并将结果提供给目标处理器20Simplestwayistosendp-1messagesfromthesourcetotheotherp-1processors-thisisnotveryefficient.Userecursivedoubling:sourcesendsamessagetoaselectedprocessor.Wenowhavetwoindependentproblemsderinedoverhalvesofmachines.Reductioncanbeperformedinanidenticalfashionbyinvertingtheprocess.21最简单的方法是将p-1消息从源发送到其他p-1处理器-这不是非常有效。使用递归加倍：源向所选处理器发送消息。我们现在有两个独立的问题是机器的一半。可以通过反转该过程以相同的方式执行减少。22One-to-AllBroadcast23

All-to-OneReductionReductiononaneight-noderingwithnode

0asthedestinationofthereduction.One-to-AllBroadcast

25One-to-AllBroadcast26

All-to-AllBroadcastandReductionGeneralizationofbroadcastinwhicheachprocessoristhesourceaswellasdestination.广播的广播，其中每个处理器是源和目的地。Aprocesssendsthesamem-wordmessagetoeveryotherprocess,butdifferentprocessesmaybroadcastdifferentmessages.一个进程向每个其他进程发送相同的m-word消息，但是不同的进程可以广播不同的消息。27All-to-AllBroadcastandReductiononaRingSimplestapproach:performpone-to-allbroadcasts.Thisisnotthemostefficientway,though.Eachnodefirstsendstooneofitsneighborsthedataitneedstobroadcast.Insubsequentsteps,itforwardsthedatareceivedfromoneofitsneighborstoitsotherneighbor.Thealgorithmterminatesinp-1steps.28All-to-AllBroadcastandReductiononaRing最简单的方法：执行一对多广播。这不是最有效的方式。每个节点首先向其邻居中的一个发送需要广播的数据。在随后的步骤中，它将从其邻居之一接收的数据转发到其他邻居。该算法以p-1步骤终止。2930tcomm=(ts+twm)(p-1)

31tcomm=2ts(√p–1)+twm(p-1)All-to-allbroadcastonaHypercubeThePrefix-SumOperationGivenpnumbersn0,n1,…,np-1(oneoneachnode),theproblemistocomputethesumssk=∑ik=0niforallkbetween0andp-1.Initially,nkresidesonthenodelabeledk,andattheendoftheprocedure,thesamenodeholdsSk.ThePrefix-SumOperation

ScatterandGatherInthescatteroperation,asinglenodesendsauniquemessageofsizemtoeveryothernode(alsocalledaone-to-allpersonalizedcommunication).Inthegatheroperation,asinglenodecollectsauniquemessagefromeachnode.Whilethescatteroperationisfundamentallydifferentfrombroadcast,thealgorithmicstructureissimilar,exceptfordifferencesinmessagesizes(messagesgetsmallerinscatterandstayconstantinbroadcast).Thegatheroperationisexactlytheinverseofthescatteroperationandcanbeexecutedassuch.分散和收集在分散操作中，单个节点向每个其他节点（也称为一对一个性化通信）发送大小为m的唯一消息。在收集操作中，单个节点从每个节点收集唯一的消息。虽然散射操作与广播基本不同，但除了消息大小的差异（消息在分散中变得更小并且在广播中保持不变）之外，算法结构是类似的。收集操作完全是散射操作的倒数，可以这样执行。GatherandScatterOperationsScatterandgatheroperations.ExampleoftheScatterOperationThescatteroperationonaneight-nodehypercube.CostofScatterandGatherTherearelogpsteps,ineachstep,themachinesizehalvesandthedatasizehalves.有logp步骤，在每一步中，机器大小一半和数据大小一半。Wehavethetimeforthisoperationtobe:Thistimeholdsforalineararrayaswellasa2-Dmesh.这个时间适用于线性阵列以及2-D网格。Thesetimesareasymptoticallyoptimalinmessagesize.这些时间在消息大小上是渐近最优的。All-to-AllPersonalizedCommunicationEachnodehasadistinctmessageofsizemforeveryothernode.每个节点对于每个其他节点具有大小为m的不同消息。Thisisunlikeall-to-allbroadcast,inwhicheachnodesendsthesamemessagetoallothernodes.这与多对多广播不同，其中每个节点向所有其他节点发送相同的消息。All-to-allpersonalizedcommunicationisalsoknownastotalexchange.多对多的个性化沟通也称为总交换。All-to-AllPersonalizedCommunicationAll-to-allpersonalizedcommunication.All-to-AllPersonalizedCommunicationonaRingEachnodesendsallpiecesofdataasoneconsolidatedmessageofsizem(p–1)tooneofitsneighbors.每个节点将所有数据片段作为一个大小为m（p-1）的统一消息发送给其邻居之一。Eachnodeextractstheinformationmeantforitfromthedatareceived,andforwardstheremaining(p–2)piecesofsizemeachtothenextnode.每个节点从接收到的数据中提取其意图的信息，并将每个大小为m的剩余（p-2）个片段转发到下一个节点。Thealgorithmterminatesinp–1steps.该算法以p-1步骤终止。Thesizeofthemessagereducesbymateachstep.每个步骤消息的大小减少m。

All-to-AllPersonalizedCommunicationonaRingAll-to-AllPersonalizedCommunicationonaRing:CostWehavep–1stepsinall.Instepi,themessagesizeism(p–i).Thetotaltimeisgivenby:All-to-AllPersonalizedCommunicationonaMeshEachnodefirstgroupsitspmessagesaccordingtothecolumnsoftheirdestinationnodes.All-to-allpersonalizedcommunicationisperformedindependentlyineachrowwithclusteredmessagesofsizem√p.Messagesineachnodearesortedagain,thistimeaccordingtotherowsoftheirdestinationnodes.All-to-allpersonalizedcommunicationisperformedindependentlyineachcolumnwithclusteredmessagesofsizem√p.All-to-AllPersonalizedCommunicationonaMesh每个节点首先根据目的节点的列对其p个消息进行分组。在每一行中，使用大小为m√p的群集消息独立执行多对多通信。每个节点中的消息将重新排序，此时根据目标节点的行。每个列中都会独立执行多对多的个性化通信，其中集群消息的大小为m√p。All-to-AllPersonalizedCommunicationonaMeshAll-to-AllPersonalizedCommunicationonaMesh:CostTimeforthefirstphaseisidenticaltothatinaringwith√pprocessors,第一阶段的时间与具有√p处理器的环相同，i.e.,(ts+twmp/2)(√p–1).Timeinthesecondphaseisidenticaltothefirstphase.第二阶段的时间与第一阶段相同。Therefore,totaltimeistwiceofthistime,i.e.,All-to-AllPersonalizedCommunicationonaHypercubeGeneralizethemeshalgorithmtologpsteps.Atanystageinall-to-allpersonalizedcommunication,everynodeholdsppacketsofsizemeach.Whilecommunicatinginaparticulardimension,everynodesendsp/2ofthesepackets(consolidatedasonemessage).Anodemustrearrangeitsmessageslocallybeforeeachofthelogpcommunicationsteps.All-to-AllPersonalizedCommunicationonaHypercube广义网格算法记录p步。在所有个人通信的任何阶段，每个节点都保存每个大小为m的p个数据包。在特定维度进行通信时，每个节点都会发送这些数据包的p/2（统一为一个消息）。节点必须在每个日志p通信步骤之前在本地重新排列其消息。All-to-AllPersonalizedCommunicationonaHypercubeAll-to-AllPersonalizedCommunicationonaHypercube:CostWehavelogpiterationsandmp/2wordsarecommunicatedineachiteration.Therefore,thecostis:Thisisnotoptimal!All-to-AllPersonalizedCommunicationonaHypercube:OptimalAlgorithmEachnodesimplyperformsp–1communicationsteps,exchangingmwordsofdatawithadifferentnodeineverystep.Anodemustchooseitscommunicationpartnerineachstepsothatthehypercubelinksdonotsuffercongestion.Inthejthcommunicationstep,nodeiexchangesdatawithnode(iXORj).Inthisschedule,allpathsineverycommunicationsteparecongestion-free,andnoneofthebidirectionallinkscarrymorethanonemessageinthesamedirection.All-to-AllPersonalizedCommunicationonaHypercube:OptimalAlgorithm每个节点简单地执行p-1通信步骤，在每个步骤中与不同节点交换m个数据字。节点必须在每个步骤中选择其通信伙伴，以便超立方体链路不会拥塞。在第j个通信步骤中，节点i与节点（iXORj）交换数据。在该时间表中，每个通信步骤中的所有路径都是无拥塞的，并且没有一个双向链路在同一个方向上携带多个消息。All-to-AllPersonalizedCommunicationonaHypercube:OptimalAlgorithmAll-to-AllPersonalizedCommunicationonaHypercube:OptimalAlgorithmAproceduretoperformall-to-allpersonalizedcommunicationonad-dimensionalhypercube.ThemessageMi,jinitiallyresidesonnodeiandisdestinedfornodej.All-to-AllPersonalizedCommunicationonaHypercube:CostAnalysisofOptimalAlgorithmTherearep–1stepsandeachstepinvolvesnon-congestingmessagetransferofmwords.有p-1个步骤，每个步骤涉及m个词的非拥塞消息传递。Wehave:CircularShiftAspecialpermutationinwhichnodeisendsadatapackettonode(i+q)modpinap-nodeensemble (0≤q≤p).节点i在p节点集合中向节点（i+q）modp发送数据分组的特殊排列（0≤q≤p）。CircularShiftonaMeshTheimplementationonaringisratherintuitive.Itcanbeperformedinmin{q,p–q}neighborcommunications

人人文库> 全部分类> 教育资料 > 课件下载

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

并行程序设计中文课件 09 并行程序通信开销

文档简介

温馨提示

最新文档

评论

并行程序设计 中文课件 09 并行程序通信开销

文档简介

温馨提示

最新文档

评论

相关文档

并行程序设计中文课件 09 并行程序通信开销