




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、Session 4: High Level Synthesis 1FPGA18, February 2527,Monterey, CA, USAAScalable Approach to Exact Resource-Constrained Scheduling Based on a Joint SDC and SAT FormulationSteve Dai, Gai Liu, Zhiru ZhangSchool of Electrical and Computer Engineering, Cornell University, Ithaca, NYhd273,gl387,zhiruzco
2、ABSTRACTDespite increasing adoption of high-level synthesis (HLS) for its design productivity advantage, success in achieving high quality-of- results out-of-the-box is often hindered by the inexactness of the common HLS optimizations. In particular, while scheduling forms the algorithmic c
3、ore to HLS technology, current scheduling algo- rithms rely heavily on fundamentally inexact heuristics that make ad hoc local decisions and cannot accurately and globally optimize over a rich set of constraints. To tackle this challenge, we propose a scheduling formulation based on system of intege
4、r difference con- straints (SDC) and Boolean satisfiability (SAT) to exactly handle a variety of scheduling constraints. We develop a specialized scheduler based on conflict-driven learning and problem-specific knowledge to optimally and efficiently solve the resource-constrained sched- uling proble
5、m. By leveraging the efficiency of SDC algorithms and scalability of modern SAT solvers, our scheduling technique is able to achieve on average over 100x improvement in runtime over the integer linear programming (ILP) approach while attaining optimal latency. By integrating our scheduling formulati
6、on into a state-of-the- art open-source HLS tool, we further demonstrate the applicability of our scheduling technique with a suite of representative benchmarks targeting FPGAs.ACM Reference format:Steve Dai, Gai Liu, Zhiru Zhang. 2018. A Scalable Approach to Exact Resource- Constrained Scheduling B
7、ased on a Joint SDC and SAT Formulation. In Pro- ceedings of 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, Monterey, CA, USA, February 2527, 2018 (FPGA 18), 10 pages. /10.1145/3174243.3174268for tackling the designproductivity gap 19. HLS raises the abstrac-
8、 tion of input from HDL to software programming language by pro- viding the capability to automatically synthesize untimed high-level software programs into cycle-accurate RTL implementations. Lower design complexity and faster simulation speed enable shorter time- to-market, which is especially rel
9、evant in todays rapidly-evolving technology landscape. Most recently, HLS has successfully acceler- ated the design of complex and realistic applications 22, 30, 37 as well as system-on-chip 1. The productivity advantage has led to growing adoption of commercial and open-source HLS tools, includ- in
10、g Vivado HLS 7and LegUp4.Because HLS transforms an untimed, possibly sequential, descrip- tion with no concept of clock into a timed parallel implementation with registers, scheduling has been recognized as oneof the most important problems in HLS. Scheduling extracts parallelism from the input high
11、-level program and determines the clock cycle at which different computation and communication operations should be ex- ecuted. With exclusive control on timing at the front-end of the hardware flow, scheduling is in a unique position to influence the micro-architecture and quality of the generated
12、hardware. Neverthe- less, finding an optimal schedule is intractable in general, and thus necessitates a tradeoff between optimality and efficiency.For example, HLS traditionally solves the classic resource- constrained scheduling problem, which minimizes latency given a limited number of functional
13、 units of each type. It is an NP-hard problem which can be optimized exactly with integer linear program- ming (ILP). However, it is typically approximated using heuristics for better scalability. One heuristic is list scheduling, a construc- tive algorithm that sorts ready operations based on an es
14、tablished priority and schedules them oneclockcycle at a time considering resource availability 27. It is a fast local optimization algorithm for minimizing latency under resource constraints, albeit sub-optimally. State-of-the-art HLS tools typically employ the more versatile sched- uling heuristic
15、 based on system of integer difference constraints (SDC) 8. SDC-based scheduling is rooted in a linear programming formulation and can globally optimize over design constraints that can be represented in the integer difference form (e.g., cycle time constraints, latency constraints). Notably however
16、, resource con- straints must be heuristically transformed into integer difference form to be considered. As a result, SDC-based scheduling is unable to optimally handle resource constraints.While scheduling heuristics are fast and scalable, they are funda- mentally inexact with no guarantee on opti
17、mality. First, scheduling heuristics are designed to consider only a restrictive set of constraints and are unable to handle more complex scheduling problems. Second, they lack the ability to perform global optimization and may miss valuable optimization opportunities that can otherwise be discov- e
18、red by exact techniques. In some cases, these challenges introduce a quality-of-results (QoR) gap whose severity remains unknown to both the designer as well as the tool itself. This gap may be exacer- bated as the quantity and variety of constraints increase for HLS to accommodate emerging applicat
19、ion domains.To address these challenges, we propose a scheduling formulation based on SDC coupled with Boolean satisfiability (SAT) to exactly1INTRODUCTIONThe breakdown of Dennard scaling has led tothe rapid growth of specialized hardware accelerators to meet the ever more stringent performance and
20、energy requirements. However, great performance- per-wattcomes atthe costofenormousdevelopmenteffort. Withthe traditional register-transfer-level (RTL) design flow, designers must constantly wrestle with low-level hardware description languages (HDLs) and manually explore a large multidimensional so
21、lution space. With the RTL design methodology, it is difficult to re-target multiple design points because the timing and micro-architecture are essentially fixed by design.As the process of RTL optimization becomes unequivocally dif- ficult, if not already unsustainable, high-level synthesis (HLS)
22、has emerged asa promising alternative tothe RTLdesignmethodology Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this no
23、tice and the full citation on the first page. Copyrights for components of this work owned by others than ACM mustbe honored. Abstracting withcredit is permitted. Tocopy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request p
24、ermissions from .FPGA 18, February 2527, 2018, Monterey, CA, USA 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-5614-5/18/02. . . $15.00/10.1145/3174243.3174268137Session 4: High Level Synthesis 1FPGA18, February 2527,Monterey, CA, USAmodel a rich set o
25、f scheduling constraints. Inspired by satisfiabil- ity modulo theory (SMT) 13, our proposed approach exploits the efficiency of SDCwhile leveraging the scalability ofmodern SAT solvers to quickly prune away infeasible schedule space and derive optimal schedule. Our scheduling technique aims to push
26、the limit on what is practically scalable with exact scheduling as well as the variety ofconstraints thatcanbe efficiently encoded andsolved. Our specific contributions are as follows:Resource constraint: 2 memory read ports availablev1 ld v2 ld 3nss0 s4 0 s1 s3 0 s2 s3 0ldv0v3 +1nsDependence constr
27、aintss s 034s4 s5 0v + 1ns(1)We propose a novel resource-constrained scheduling formula- tion, which combines SDC and SAT problems, to exactly and efficiently encode both resource and timing constraints in HLS. We devise an exact yet fast resource-constrained scheduling al- gorithm for HLS based on
28、conflict-driven learning by leveraging the efficiency of SDC and scalability ofmodern SATsolvers. We employ problem-specific knowledge to specialize our schedul- ing algorithm to enable optimization and incremental scheduling techniques that further improvescalability.We apply our specialized schedu
29、ler within the open-source HLS tool LegUptoefficiently synthesize high-quality RTLforarange of representative benchmarks targeting FPGAs.4s s -1Cycle time constraints25s1 s5 -1(b)st 1nsv5(2)(a)Figure 1: Motivational and running example for this paper (a) DFG for our example. Delay of each operation
30、type is indicated next to the corresponding node. Resource constraint denotes that only two memory read ports are available. No resource constraints are imposedon add or store operations. (b) Dependence constraints and cycle time constraints corresponding to the DFG for a target clock period of 5ns.
31、(3)(4)The rest of this paper is organized as follows: Section 2 provides background on scheduling and relevant theories, as well as motiva- tion for our approach; Section 3 details our scheduling formulation; Section 4 describes our specialized conflict-driven scheduler; Sec- tion 5 presents experim
32、ental results; Section 6 provides related work and additional discussions, followed by conclusions in Section 7.2 PRELIMINARIESA typical HLS flow employs asoftware compiler (e.g., LLVM, GCC) to compile the input high-level program into a control data flow graph (CDFG) on which scheduling is then per
33、formed. In this paper, we focus on the resource-constrained scheduling problem, which is also a classic optimization problem in operation research. In the context of HLS, the problem is described as follows:Given: (1) A CDFG G(VG , EG ) where VG represents the set of operations in the CDFG and EG re
34、presents the set of edges; (2) A set of scheduling constraints, which may include dependence constraints, resource constraints, cycle time constraints, and relative timing constraints.Objective: Construct aminimum-latency schedule sothat every operation is assigned to at least one clock cycle while
35、satisfying all scheduling constraints.We illustrate the three types of scheduling formulation using the data flow graph (DFG) in Figure 1(a). As our running example, we would like to schedule the DFG targeting a clock period Tclk of 5ns. We assume that each add or store operation incurs adelay of 1n
36、s, and each load operation incurs adelay of 3ns. Wefurther assume that only two memory read ports are available, so at most two load operations can be scheduled within the same cycle. add and store operations are unconstrained.2.1 SDC-Based FormulationSDCisasystem of inequality constraints in the in
37、teger difference form xi xj bij , where bij is an integer, and xi and xj are vari- ables. The system is feasible if there exists a solution that satisfies all inequalities in the system. Because of the restrictive form of the con- straints, SDC can be solved efficiently. For SDC-based scheduling 8,
38、aschedule variable si isdeclared for each operationi in the CDFG to denote the clock cycle at which operation i is scheduled. All SDC scheduling constraints are then expressed in the integer difference form so that the system consists of a totally unimodular constraint matrix over which an optimal i
39、nteger solution can be guaranteed in polynomial time. For resource-constrained scheduling, we minimizethe objective l such that l si i, wherel representsthelatencyof the design.To handle data dependence, SDC creates the following difference constraint foreach data edge from operation i to operation
40、j inG.si sj 0(1)Inour example, because there is an edge from nodev0 to node v4,SDC will impose the difference constraint s0 s4 0 to ensure that v4 is scheduled no earlier than v0. Similar constraints are con- structed for other data dependence edges. To honor the target clock periodTclk , SDC identi
41、fies the maximum critical combination delay D(ccp(vi ,vj ) between pairs of operations i and j and constructs the following different constraint to ensure that the combinational path withtotaldelay exceeding the target cycle timeTclk must be partitioned into D(ccp(vi ,vj )/Tclk number of clock cycle
42、s.1)si s j ( D( ccp(v i, vj)/Tclk(2)In our example, because the maximum critical delay from v2 to v5 (D(ccp(v2,v5) = 6ns) exceeds the target clock period of 5ns, SDC will impose the constraint s2 s5 1 to ensure that v5 is sched- uled at least one cycle after v2. Similar constraints are imposed for c
43、ombinational paths fromv1 tov5 andv0 tov5. Theaforementioned dependence and cycle time constraints are indicated in Figure 1(b). While SDCis able tomodel timing constraints exactly, itmust heuristically transform resource constraints into the integer differ- ence form by imposing a particular heuris
44、tic linear ordering on the resource-constrained operations. This process separates resource- constrained operations appropriately into different cycles to ensure that sufficient resources are available to execute operations sched- uled within the same cycle. The linear ordering consists of a set of
45、precedence relationships between pairs of resource-constrainedoperations i and j represented in the form ofsi sj Li(3)where Li denotes the latency (in cycles) of operation i. Although the linear ordering results in a legal schedule that satisfies all re- source constraints, the schedule is likely su
46、b-optimal because the linear ordering is devised heuristically. There are many possible such legal linear orderings, some resulting in better schedules than others. However, SDC can simply pick one particular linear ordering heuristically and without knowledge ofwhether it isoptimal.138Session 4: Hi
47、gh Level Synthesis 1FPGA18, February 2527,Monterey, CA, USAwhere L denotes the maximum latency. Because si isanalogoustothe corresponding schedule variable in SDC, dependence constraints in ILP can be equivalently represented as the difference between pairs of schedule variables asin Eq. 1.Forour ex
48、ample, wecansafely assume a maximum start time equal to the number of operations N = 6.Resource constraint: 2 memory read ports availables0 s4 0s0 s4 0v2lds s 0vvv1ld2ld13s2 s3 0 s3 s4 0 s4 s 5 0 s2 s5 -11lds s 013s2 s3 0 s3 s4 0s4 s 5 0 s2 s5 -1ldldv0v0It follows that we declare variables x 0,0x0,1
49、 x02, 0x3,0x4 ,0x5 forv3 +v3 +operationv0 anddenote that s0 = 61 t x0.tVariables are similarlyt =0declared andderived foroperationsv1 tov5 . Theobjective is same as that defined in Section 2.1 for the SDC formulation.Unlike in SDC, resource constraints can be encoded exactly as linear constraints in
50、 ILP. To ensure that the number of active op- erations of type r in clock cycle t does not exceed the number of available type-r resourcesar, the ILPformulation imposesthere- source constraint+s s -115v4v4s s -115v5 st(a)v5 st(c)(b)(d)Figure 2: Partial ordering edges are heuristically imposed on the
51、 DFG, and subsequently in the SDC, to satisfy the resource constraints Partial ordering edges are shown in bold, and cor- responding difference constraints are boxed. (a)-(b) represent a dif- ferent combination of partial ordering edges than (c)-(d). Minimum latency differs depending on the particul
52、ar combination.?txit ar(5)i:RTi =r t =t Liwhere RTi and Li denote the resource type and latency of operationi, respectively. F?or our example, the ILP formulation needs to impose2 2 for each clock cycle t because only twoxittheconstraintsi=0For our example, SDC must impose partial orderings among th
53、e resource-constrained load operations because only two memory read ports are available for the three load operations (v0, v1, and v2). Onone hand, SDCcanimpose anedgefrom v0 tov1 asshown in bold in Figure 2(a) to separate v0 and v1 into different cycles so that each cycle has at most two load opera
54、tions. With this heuristic partial ordering, the DFG requires at least three cycles to execute due to the critical path delay from v0 to v5. Given the target clock period of 5ns, v0 and v1, each of which incurs a delay of 3ns, must be scheduled in separate cycles given the partial ordering edge betw
55、een them. v5 cannot be scheduled in the same cycle as v1 because there is no slack remaining in the clock cycle after scheduling v3 and v4. On the other hand, if SDCinstead imposesanedgefrom v1 tov0 andanotheredge from v2 to v0 as shownin bold in Figure 2(c), the DFGcanachieve a better latency of on
56、ly two cycles while ensuring that each cycle has at most two load operations. In Figure 2, corresponding SDC constraints are shown in (b) and (d), respectively, with appended partial ordering (“resource”) constraints boxed.From this example, we see that it is necessary to enumerate allpossible combi
57、nations of partial orderings and solve an SDC for each combination of imposed “resource” edges to find the optimal (minimum-latency) schedule. However, attempting all combinations isnotscalable in thegeneral caseforanarbitrary numberofresource- constrained operations. For this reason, SDC heuristica
58、lly imposes one particular partial ordering without guarantee ofoptimality and proceed with solving the scheduling problem without regards to the effect of any sub-optimality on the solution.memoryportsareavailable. These constraints apply tothe resource- constrained load operations v0, v1, and v2 (i.e., i = 0, 1, 2). The second summation is omitted because thelatency of load operation is zero-cycle in our example. The ILP formulation also requires t
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 社区慢病管理方法
- 2025年德育个人工作方案幼儿园演讲稿
- 护理学休克病人的急救护理
- 合同履行监督与评估指南
- 术后谵妄护理个案
- 保育员培训配合教育活动
- 神达电脑人力资源机构组织
- 滨州职业学院《功能高分子》2023-2024学年第二学期期末试卷
- 内蒙古鸿德文理学院《电视演播室》2023-2024学年第二学期期末试卷
- 安徽卫生健康职业学院《形势与政策Ⅲ》2023-2024学年第一学期期末试卷
- 《用户体验测试》课件
- 隔离与防护措施的正确应用
- 高血压问卷设计(知信行模式)
- 2023年北京市丰台区初三英语一模试题及答案
- 2023青海省安全员《C证》考试题库
- 职业病危害告知书
- TRIZ理论――创新方法课件
- CORN术中获得性压力性损伤风险评估量表评定细则解读
- 预毕业证明(共5篇)
- 中国大唐集团公司以热率为核心能耗管理指导意见
- 南方科技大学自述信800字范文六篇
评论
0/150
提交评论