已阅读5页,还剩17页未读, 继续免费阅读
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
J Supercomput (2014) 70:488509DOI 10.1007/s11227-014-1264-0Exploiting controlled-grained parallelismin message-driven stream programsYan Su Feng Shi Shahnawaz Talpur Jin Wei Hai TanPublished online: 30 July 2014 Springer Science+Business Media New York 2014Abstract With the increasing amount of parallelism obtainable on multicore plat-forms, stream programming has been proposed as an effective solution for exposingdistributed parallelization. Nonetheless, a pressing demand of scheduling task and dataparallelism in stream programming exists that can accomplish robust multicore per-formance in the face of varying application characteristics. This paper addresses theproblem of scheduling task and data parallelism in stream programming. We presentStreamMDE, an asynchronous concurrency stream programming framework whichoffers a novel parallel programming model for scheduling task and data parallelism inthe message-driven execution paradigm. A key property of this framework is exposingcontrolled-grained parallelism, which allows us to control the granularity of task anddata parallelism in stream graph. Our empirical evaluation of StreamMDE shows thathigher efciency of mixed task and data parallelism in stream programming can beexploited with the appropriate granularity control. The framework bridges the gapbetween the parallel scale and the architecture of stream programs and facilitates indesigning and coding stream features in different schedules.Y. Su F. Shi S. Talpur J. Wei (B) H. TanBeijing Institute of Technology, Beijing, Chinae-mail: Jin.SY. Sue-mail: F. Shie-mail: S. TalpurMehran University of Engineering and Technology, Jamshoro, Sindh, Pakistane-mail: H. Tane-mail: h_123Exploiting controlled-grained parallelism in message-driven stream programs 489Keywords Controlled-grained parallelism Multicore Message-driven execution C+11 Stream programming1 IntroductionMulti-core processors enter in the mainstream and present many challenges in paral-lel programming 14. To exploit more potential parallelism of these architectures,stream programming has been proposed as an effective solution to expose distrib-uted parallelism. This style of programming has not only been used to write imageand video processing applications, but also evolved into a more general-purpose modelencompassing a broader set of data-intensive applications including irregular scienticapplications 5. Stream programming models such as Brook 6, Baker 7, CUDA8, StreamIt 9 are high-level programming models. These models are widely usedin several application domains of graphics, voice and signal processing. However,in these stream programming models, the costs of communication and synchroniza-tion affect scheduling directly and substantially affect parallel execution efciency.Controlling multiprocessor scheduling is crucial not only for performance but alsobecause data races enable scheduling to affect a programs functional behavior 10.In stream programming, scheduling can play a tremendous role by selecting thread-tocore mappings that either facilitate cooperative inter-thread data sharing or mitigateshared resource contention between competing threads 11.Message-driven models are suitable solutions for scheduling in the parallel-distributed computing environment of stream programming 12,13. In this paradigm,asynchronously invoking a method on a remote object can also be thought of as sendinga ”message” to it 14. It approaches the parallel programming problem, alternativefrom thread-based paradigm to asynchronous concurrency execution. In message-driven models, there are many entities (e.g., objects, user level threads, handlers, etc.)communicating between each other. The runtime system activates the entity when thedata it is waiting for are available 15. The data, control information, and actions aretransferred in messages. Expression of the message-driven execution (MDE) paradigmsuch as Charm+ 14 and AM+ 16 supports both irregular and regular applica-tions, and presents a novel programming and runtime system. To schedule the taskand data parallelism efciently in stream programs, we study incorporating messagepassing mechanisms into the stream programs for exposing distributed parallelism.In this paper, we propose a new methodology called StreamMDE. This methodol-ogy is an asynchronous concurrency stream programming framework which providesa logical approach to exploit controlled-grained parallelism using the message mech-anism. In contrast to the synchronous stream models 1719, this message-drivenasynchronous stream model offers a promising approach for exposing parallelism in aspecied scale suitable for multicore architectures. To expose more low-level detailsassociated with sequential consistency to programmers, we implement the stream pro-gramming framework based on the multi-threading memory model of C+11 standard.The contributions of this paper are the following: Message-Driven Stream Programming Model: We present a stream programmingmodel for scheduling task and data parallelism based on the message passing par-123490Y. Su et al.adigm. The programming model decomposes the stream computation procedureinto two main processes: the divergence process and the convergence process. Thestream framework provides programmers with two basic classes (chain and joiner)to construct the stream graph.Effect Deterministic Communication: To control the hazards associated with com-munication and synchronization in stream programs, we propose the Effect Deter-ministic Communication (EDC) technique for dynamically adjusting the streamgraph and reconnecting the corresponding channels during the execution. It alsosignicantly reduces the number of critical state check.Message Integration Oriented Virtual Channel: To reduce the costs of buffering andlatency associated with excessive data parallelism, StreamMDE offers an effec-tive approach to schedule the granularity of data parallelism. It uses a data paral-lelism paradigm to parse the integrated message and allocates data blocks throughmessage-based virtual data channels.Controlled-Grained Parallelism: Combining the EDC and Message Integration-Oriented Virtual Channel (MIO-VC) techniques, StreamMDE offers a newapproach for exploiting parallelism in an appropriate granularity suitable for mul-ticore architectures. The approach bridges the gap between the parallel scale andthe architecture of stream programs.Evaluation: We present an empirical evaluation of our system. We show thatStreamMDE programs outperform the corresponding stream program with statictask parallelism or unlimited data parallelism. We also demonstrate the necessityfor balancing task parallelism and data parallelism in stream programs.2 Related worksThere are plenty of languages and frameworks for scheduling parallelism in streamprograms describes in literature. Buck et al present Brook, a system for general-purposecomputation on programmable graphics hardware. As an extension of standard ANSIC, the Brook language incorporates the ideas of data parallelism with arithmetic ef-ciently. It presents a compiler and runtime system that abstracts and virtualizes manyaspects of graphics hardware 6. The SHIM concurrent programming language adoptsdeterministic message passing it as sole communication mechanism and guaranteesscheduling independence. It adopts an asynchronous concurrency model. Only com-munication affects the relative execution rates of concurrent tasks 10. Braid is anobject-oriented language which supports both task and data parallelism on MultipleInstruction Stream Multiple Data Stream (MIMD) machines. It is a set of data parallelextensions to the Mentat Programming Language (MPL) which allows programmersto integrate task parallelism, data parallelism, nested task and data parallelism within asingle language on top of a single runtime system 20. With the synchronous dataowmodel of execution 21, the StreamIt language provides high-level representationsto improve programmer productivity and program robustness within the streamingdomain 22. It places more emphasis on exposing task and pipeline parallelism andfocuses on well-structured programs that can be aggressively optimized 23. Gordonet al. present an end-to-end stream compiler that attains robust multicore performance123Exploiting controlled-grained parallelism in message-driven stream programs 491in the face of varying application characteristics. The StreamIt compiler performs aCoarse-Grained method to exploit task, data and pipeline parallelism for stream pro-grams. It bridges the gap between the original granularity of the program and theunderlying granularity of the architecture 24. Spring et al. present a programmingmodel called StreamFlex for real-time streaming which allows developers to writestream processors in Java. StreamFlex lters are isolated components that communi-cate by non-blocking bounded channels. Software transactional memory is used forsynchronization of shared data channels 26. Ruggiero et al. propose a complete allo-cation and scheduling framework, where an MPSoC 27 virtual platform is used toaccurately derive input parameters, validate abstract models of system componentsand assess constraint satisfaction and objective function optimization. The optimizerimplements an exact approach for scheduling parallelism based on problem decompo-sition. The allocation subproblem is solved through Integer Programming 28 whilethe scheduling one through Constraint Programming 29.However, from the above-related works it could be identied that current researchesin scheduling parallelism of stream processing mainly focus on the synchronousdata ow model. None of these systems provides an efcient way of relationshipbetween exploiting multiple level parallelism and application structure. If taking intoaccount the varying application characteristics, a more exible approach for bridgingthe gap between the parallel scale and the architecture is needed in streaming appli-cations. Compared to the conventional languages and frameworks, our work placesmore emphasis on exploiting task and data parallelism based on the message pass-ing paradigm. By adopting the asynchronous MDE model, StreamMDE provides aexible architecture which enables adjusting the granularity of parallelism associatedwith application structure. In this research work, general techniques are developed forscheduling parallelism in stream programs.3 StreamMDE programming framework3.1 Programming modelStreamMDE is a C+11-based parallel stream programming framework, foundedon the Divergence-Convergence programming model, which exploit parallelismusing message-driven mechanism. The key feature of the programming model iscomputation-oriented decomposition in two steps. As shown in Fig. 1, the rst step isthe divergence process. The programmer decomposes the stream computation proce-dure into a number of computation units and denes the hierarchical relationship andfunction of the computation units. Correspondingly, the convergence process consistsof a number of merge units. Each merge unit maps onto one or more computation units.The merge units collect and calculate the computation results of the computation units.It is like a barrier to synchronize the procedure execution in the distributed and asyn-chronous environment. A StreamMDE program is a collection of independent unitswhich communicate by the means of uni-directional message channels. This modellends itself naturally to concurrent and distributed implementations for schedulingparallelism.123492 Y. Su et al.3.2 Message-driven execution mechanismStreamMDE makes available two hierarchical primitive nodes for composing com-putation units into larger stream graph as shown in Fig. 1. The basic computationalunit of the divergence phase in StreamMDE is a chain node in which a single inputand multiple outputs are driven by asynchronous messages. The basic computationalunit of the convergence phase in StreamMDE is a joiner node, which has multipleinputs and a single output to merge the computation results asynchronously. There areseveral chain nodes and joiner nodes distributed on available cores of the system inStreamMDE programs that interact with each other via asynchronous method invoca-tions. The invocation procedure can be supposed as posting a “message” to the remoteobject. Thus, the transmission of parameters and data stream between chain nodes isdriven by messages. The joiner node receives the messages from the correspondingchain nodes and merges the distributed computation results in its own executive body.Through this process, the joiner node synchronizes multiple distributed computationalmodules. It is akin to a barrier to control the computation ow to conrm the startingpoint of the next level of the stream computation.The stream program is a collection of nodes (chain nodes or joiner nodes) connectedby message channels. Each node is a functional unit that consumes data from itsinput messages and produces results on its output messages. It is an implementationof a chain node or a joiner node that inherits from a thread-based class. As nodesare independent and isolated from one another, they have separate message queuesand independent address spaces. The communication channels are translated into theFig. 1 DivergenceConvergence programming model123Exploiting controlled-grained parallelism in message-driven stream programs 493message queue to pass the input and output stream parameters. The nodes are groupedto make partitions and scheduled to different stacks to exploit task and data parallelism.The manager of StreamMDE maintains a work thread pool for all nodes in the streamgraph and schedules each node (chain node or joiner node) using EDC method (seeSect. 5.1).4 Design exploration with StreamMDEStreamMDE implements the C+11 language syntax structures based on the ISOC+11 standard 30. It is a hierarchical and scalable framework for stream program-ming, which facilitates stream features in the designing and coding of different sched-ules. In this section, we introduce the algorithmic development process of BlockedMatrix Multiplication 24 in StreamMDE.4.1 The C+11 languageC+11 is the ISO C+ standard ratied in 2011. It standardizes the support of multi-threaded programming, which is the major update of the new version. C+11 offers amemory model which allows multiple threads to co-exist in a program. The memorymodel denes when multiple threads may access the same memory location and speci-es when updates by one thread become visible to other threads 15; it is an agreementbetween computer architects and compiler developers to ensure that most program-mers are not required to consider the details of modern computer hardware 31. In thenew memory model, execution of two threads can update and access separate memorylocations without interfering each other 31. Using atomic operations (locks, mutualexclusions mutexes, or atomic variables) on memory locations, no data race occursin the program. The sequential consistency for data race free programs limits the opti-mization (e.g., out-of-order execution) associated with the variables which are neededto synchronize between threads. Additionally, this guarantees that the execution orderfollows the sequential consistency model. Thus, the multithreading memory model ofC+11 standard establishes a balance between programmability and performance.In this work, each computational unit requires a separate message queue and an inde-pendent address space to communicate with each other by asynchronous messages.This relies on the multi-threading memory model of C+11 standard, which offerssufcient atomic operating (locks, mutual exclusions mutexes, or atomic variables)and better synchronization mechanism for high-performance computing. Comparedto the traditional mutex lock, the new type atomic in 32 can be used to implementquality parallel algorithm (e.g., Lock Free), which achieves better performance inshared-memory multi-processors programming. Using the atomic type to perform thesynchronization work in stream graph, a deterministic message-passing concurrencymechanism is established in the Divergence-Convergence programming model (seeSect. 3). Stream data are processed properly and continuously in the message-drivenstream graph as in the synchronization paradigm.123494Fig. 2 The process owdiagram of Blocked MatrixMultiplicationY. Su et al.4.2 Constructing stream graphIn this section, the development process of Blocked Matrix Multiplication was pre-sented. Figure 2 illustrates the process ow diagram. To construct the stream graph,we partitioned the computing process into several independent modules. Then, wedened the hierarchical relationship and functionality of each module. As previouslydescribed, the stream framework provides programmers with two basic classes: chainand joiner. The derived class of chain is implemented according to the order of instan-tiation. The derived class of joiner is used to synchronize the multiple distributedcomputational units. Figure 3 illustrates the implementation of the parallel algorithmsin StreamMDE. The Blocked Matrix Multiplication algorithm consists of six work-ing units, such as DataPartition, BlockSplit, Transpose, BlockMultiply, BlockAdd andBlockCombine. The DataPartition unit allocates the data of two matrices into differentpipelines. The Transpose unit transposes the multiplier matrix before splitting. In theBlockSplit unit, the primitive matrices are split into several partitioned matrices toexploit task parallelism in stream programs. Multiple BlockMultiply units execute the123Exploiting controlled-grained parallelism in message-driven stream programs 495Fig. 3 The implementation of Blocked Matrix Multiplicationmultiplication of distri
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 调研报告:政府采购项目履约验收存在的问题及建议
- 上海市公有住房使用权转让合同
- 窗帘购销合同2024年
- 三人养殖合作合同范本2024年
- 企业税务筹划合同
- 工程建设项目借款合同
- 兼职会计人员合同模板
- 七年级下册第一单元青春时光(知识梳理)-2023年中考道德与法治一轮复习
- 二手车交易合同签订指南
- 统编版语文七年级上册(2024)第7课《散文诗二首》知识清单(学案)
- 人教版八年级英语下册各单元知识点汇总
- 一起电动自行车火灾事故原因认定和分析
- 广东省广州市2023-2024学年高一上学期1月期末英语英语试题(解析版)
- 【教材】第四讲电影案例景别分析
- 2023~2024学年度上期高中2023级期末联考政治双向细目表
- 强制性标准执行情况检查表
- 智慧校园音视频系统整体解决方案
- 采购流程自动化与数字化转型方案
- 视觉传达专业大学规划书
- 高速铁路牵引网故障测距原理讲述
- 《中国人口老龄化》课件
评论
0/150
提交评论