全局布局优化的一种增广拉格朗日方法.doc_第1页
全局布局优化的一种增广拉格朗日方法.doc_第2页
全局布局优化的一种增广拉格朗日方法.doc_第3页
全局布局优化的一种增广拉格朗日方法.doc_第4页
全局布局优化的一种增广拉格朗日方法.doc_第5页
已阅读5页,还剩7页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

精品论文全局布局优化的一种增广拉格朗日方法李维国1, 陈建利2, 朱文兴1,21 福州大学 离散数学与理论计算机研究中心,福州 3500022 福州大学 数学与计算机科学学院,福州 350108 摘要:如果忽略单元重叠,全局布局要计算出单元的最佳位置以最优化特定度量标准 (如, 线 长, 密度溢出)。全局布局影响到一个电路芯片的可布线性, 性能, 能耗等,是大规模集成电路物 理设计的一个关键环节。本文提出了一种增广拉格朗日方法来优化集成电路全局布局问题。该 方法采用了动态的密度惩罚因子放大策略来平衡线长与单元密度。将此方法与 ntuplace3 的 全局布局框架结合,并在 ibm 的混合单元测试例子进行试验。结果表明该方法能在合理时间 内得到质量更好的布局。 关键词:非线性优化;大规模集成电路;全局布局;增广拉格朗日方法中图分类号: tn47; tp301.6an augmented lagrangian method for vlsi global placement optimizationli wei-guo1, chen jian-li2, zhu wen-xing1,21 center for discrete mathematics and theoretical computer science, fuzhou university, fuzhou 3500022 college of mathematics and computer science, fuzhou university, fuzhou 350108abstract: ignoring some cell overlaps, global placement computes the best position for each cell to minimize some cost metric (e.g., total wirelength, density overow). it is a crucial step in very large scale integration(vlsi) physical design, since it aects routability, performance, and power consumption of a circuit. in this paper, we propose an augmented lagrangian method to solve the vlsi global placement. in this method, a cautious dynamic density weight increasing strategy is used to balance the wirelength and density constraint. we incorporated our method into ntuplace3s global placement framework, and tested it on the ibm mixed-size benchmark circuits. experimental results show that it obtains high-quality results in a reasonable running time.key words: nonlinear optimization; vlsi; global placement; augmented lagrangian method基金项目: national natural science foundation of china (61170308); national key basic research special foundation of china (2011cb808000)作者简介: li wei-guo(1987-),male,graduate student,major research direction:vlsi global placement algorithm.chen jian-li(1985-), male,lecturer,major research direction:vlsi lorplanning/placement algorithm. correspondence author:zhu wen-xing(1968-),male,professor,major research direction:optimization theory and application, algorithm in vlsi physical design automation.- 5 -0 introductionthe vlsi placement problem involves placing a set of cells into a xed die for a given netlist, such that there are no overlaps among objects and some cost metric (e.g., wirelength, density overow) is optimized 1 . since the problem is np-complete 2, 3 , and designs with millions of cells are now common, it is a challenge to design ecient algorithms for producing high quality placement solutions 4 . since circuit performance heavily depends on placement results, the placement problem has attracted much attention recently, and many new academic placers were invented in recent years 516. those placers can be classied into three major categories 17 : meta-heuristic methods 5, 6, min-cut 79, and analytical algorithms 1016. among those methods, the analytical placers have shown their superior eciency and quality.the analytical approach placers consist of three major steps: global placement, legalization and detailed placement 1, 17, 18 . ignoring some circuit elements overlaps, global placement com- putes the best position for each circuit element. then, legalization removes all cell overlaps for each region. finally, detailed placement renes the placement solution 17 . global placement is generally considered as the most important step, due to its crucial impact on the overall placement quality 4, 19 . state-of-the-art algorithms for global placement form two families: (i) force-directed placers, such as kraftwerk2 10, fastplace3 11, rql 12 and simpl 13, and (ii) non-linear optimization techniques, such as aplace2 14, ntuplace3 15 and mpl6 16. algorithms in both categories are directly used in the industry or closely resemble those in industry placers 13 . tools based on non-linear optimization techniques achieve the best results reported for academic implementations 20 and eda vendor tools 13 . among these tools, n- tuplace3 15 obtained the best hpwl (half perimeter wirelength) in ispd (international symposium on physical design) 2006 placement contest 20, and took the 1st place at the dac (design automation conference) 2012 routability-driven placement contest 21.this paper is focused on nonlinear optimization in global placement which has been shown to be the most critical step in the placement ow. with the ntuplace3 15 placement model, we develop an augmented lagrangian method to solve the vlsi global placement optimization problem. the remainder of this paper is organized as follows. section 1 gives the problem statement and overviews the global placement method of ntuplace3. augmented lagrangian method and modications to ntuplace3 are explained in section 2. section 3 reports the experimental results of both methods. finally, the conclusions are given in section 4.1 preliminaries1.1 problem statementin the vlsi global placement problem, all cells in the chip are considered as vertices; all nets are considered as hyperedges 17 . given a design with n cells and m nets, the netlist can be formulated as a hypergraph h (v, e), where v is the set of vertices, v = v1 , v2 , . . . , vn, and e is the set of hyperedges, e = e1 , e2 , . . . , em. the coordinate (xi, yi) denotes the center point of the cell vi, and wi and hi denotes the width and height of the cell vi respectively. let ai be the area of the cell vi, i = 1, 2, , n. the design region is a rectangle denoted by (0, 0) and (w, h ). we intend to pack all the cells into the design region and determine their optimal positions so that the total wirelength is minimized and there is a little overlaps among cells 17 .to evenly distribute the blocks, the placement region was divided into uniform non- overlapping bin grids. then, the global placement problem can be formulated as a constrained minimization problem as follows 15, 17 :min w (x, y), s.t. db(x, y) mb, f or each bin b,(1)where the wirelength w (x, y) is dened as the total half perimeter wirelength (hpwl) 17,| |w (x, y) = ( maxvi ,vj exixj + maxvi ,vj e|yi yj |). (2)eedb(x, y) is the potential function that is the total area of movable blocks in bin b, and mb is the maximum allowable area of movable blocks in bin b. mb can be computed by mb = tdensity (wbhb pb), where tdensity is a user-specied target density value for each bin, wb(hb) is the width (height) of bin b, and pb is the base potential equal to the pre-placed block area in bin b. note that mb is a xed value as long as all pre-placed block positions are given and the bin size is determined 15, 17 .given v and e, the objective of the placement problem is to nd a placement such that the cost function w (x, y) is minimized and the constraints are satised.1.2 nonlinear method in ntuplace3s global placementthe global placement of ntuplace3 is based on the multilevel framework which applies a two-stage technique of bottom-up coarsening followed by top-down uncoarsening 15 . the coars- ening stage iteratively clusters blocks based on connectivity/block size to reduce the problem size until the problem size is below a given threshold. then, an initial placement is computed through quadratic programming. in the uncoarsening stage, it iteratively declusters the blocks and rene the block positions to reduce the wirelength. the declustering process continues untilthe nal placement is found. during the uncoarsening stage, ntuplace3 15 applies the ana- lytical model for the global placement. since the hpwl is not dierentiable, ntuplace3 15 uses the log-sum-exp wirelength w (x, y)lse 22, to smooth the hpwl wirelength function.circuit density is controlled mainly by cell spreading during global placement and cellsliding during detailed placement. the density function db(x, y) is dened asdb(x, y) = px(b, v)py (b, v), (3)vvwhere px and py are the overlap functions of bin b and block v along the x and y directions. since db(x, y) is neither smooth nor dierentiable, the bell-shaped potential function 14, 15 is used to smooth px and py . by doing so, the non-smooth function db(x, y) can be replaced by a smooth one,d b(x, y) = cv px(b, v)py (b, v), (4)vvwhere cv is a normalization factor so that the total potential of a block equals its area 15 .in ntuplace3 15, the quadratic penalty method is used to solve a sequence of uncon- strained minimization problems,min w (x, y)lse + (db(x, y) mb)2(5)bwith increasing s. note that w (x, y)lse and db(x, y) are smoothed wirelength and density functions respectively. the solution of the previous problem is used as the initial solution for the next one. ntuplace3 15 solve the unconstrained problems by a conjugate gradient (cg) algorithm. the step size in cg is computed by a novel ecient method. after computing the cg direction dk , the step size k is calculated by k = swb/|dk |2 , where s is a user-specied scaling factor, and wb is the bin width. by doing so, the step size of block spreading canbe controlled by s when the total quadratic euclidean movement is xed, vv (xi + yi ) =22i |k dk |2 = s2 w2 , where xand yare the amount of the movement along the x and y directions2biifor the block vi in each iteration. the value of s aects the precision of objective minimization;smaller s values lead to better results and longer runtime. in our implementation, we set s to0.2 as its the default value in ntuplace3(v12.03.26).2 proposed algorithmnon-convex constraint always limits the solution quality in optimization problems. to improve the quality of global placement, we need a better framework to balance the wirelength and non-convex density functions. in this section, we rst propose a new form of augment- ed lagrangian multiplier method which could degenerate to penalty method by a reductionparameter. second, a more cautious density penalty increasing strategy is dened, aims to balance the wirelength and density constraint.2.1 reduced-augmented lagrangian multiplier methodthe conjugate gradient method in ntuplace3 15 is very ecient. we rst implemented the following traditional augmented lagrangian multiplier(alm) algorithm,klse k bb b b01. solve min w (x, y) + (dm + k )2 ,02. k+1 = k + k (d b mb), b = 1, 2, 3, ., n2 ,bbb03. k+1 = k ,04. set k k + 1 and go to step 01.and used ntuplace3s cg method as subproblem solver. however, the result comes dramat- ically worse than ntuplace3s penalty method, especially on larger-size circuits. notice that conjugate gradient method in ntuplace3 uses a user-specied scaling factor s to control total quadratic euclidean movement, rather than make an exact line search. solutions of subprob- lems obtained in this way should not be or approximate enough to their stationary points. thus multipliers updated by these solutions could be misleading.to preserve the eciency of subproblem solver, we modied the above alm algorithm asfollows:klse k bb b b 01. solve min w (x, y) + (dm + k )2 ;02. k+1 = k + k (d b mb), b = 1, 2, 3, ., n2 ,bbb03. k+1 = k ,04. set k k + 1 and go to step 01.kthe dierence is in line 01 and line 02. the multiplier term in line 01 is replaced by b ,kand we adopt a damping factor in line 02. it is more stable than only , or is used. clearlythat if subproblems are solved exactly, then it is not necessary to dene at all, or equivalently, set to 1. thus the reduction factor is a subproblem solver precision dependence parameter. particularly, if tends to innite, this alm form degenerates back to the penalty method.2.2 dynamic density weight increasing strategyexperiments revealed that the penalty increasing factor, , plays an important role in solution quality and running time. among certain range, the algorithm tends to obtain better results with smaller , but consumes more running time. further, too small of also leads to quality reduction.we involved a cautious strategy for increasing the penalty parameter among subproblems. when the blocks are separate enough to some criterion, we believe that more optimization should be made in wirelength there after. at this situation, we adopt a minor increasing factor- 5 -1 , to pay more attention in wirelength optimization. if density overow is well decreased than last iteration, we might also use a increasing factor 2 , small enough, to slow down the penalty weight increasing rate, to prevent the function from dominated by the density term. otherwise, we set to a relatively larger one, 3 , to further distribute the cells. generally, weset 1 2 nmax )03. level+;04. hlevel = f irstchoiceclustering(hlevel1 );05. initialize block positions by solveqp (hlevel );06. for currentlevel = level to 007. initialize bin grid size nb = blockn umber(hcurrentlevel );08. initialize base potential for each bin; k = 0;09. initialize lagrangian multipliers k = 0, b = 1, 2, 3, ., n2 ;bb10. initialize last_overf low_ratio = overf low_ratio;- 6 -11. initialize 0 = min(max( w (x,y)b d b (x,y), min ), max );12.do k 2k13. solve min w (x, y)lse + k b (d b mb + b ) ;14. k+;15. k = k1 + k1 (d m ), b = 1, 2, 3, ., n2 ;bb bbb16. compute overf low_ratio;17. if (overf low_ratio 5%)18. k = 1 k1 ;19. else if (overf low_ratio 0.5 last_overf low_ratio)20. k = 2 k1 ;21. else22. k = 3 k1 ;23. last_overf low_ratio = overf low_ratio;24. if (currentlevel = 0 & overf low_ratio 10%)25. call lookaheadlegalization() and save the best result;26. until (spreading enough or no further reduction in overow ratio)27. if (currentlevel = 0)28. restore the best look-ahead result and return;29. else30. decluster and update block positions.图 1: outline of our placement ow.3 implementation & experimentour global placement ow based on ntuplace3 15 is shown in fig.1. we set the parame- ters described above as follows: =0.85, =3.9, 1 =1.6, 2 =1.9, 3 =2.2. and the safeguardingparameters for initial density constraint weight (line 11 in fig.1) are min=2.0e-8, max=2.0e+8. these parameters are chosen by some experiments on ibm01, 03 and 05. for example, fig.2 shows the placement quality contour with respect to and . optimal results are located in dark blue regions. it shows that the algorithm obtains better results with moderate and ,and placements get worse when and both tend to 1.- 11 -5.554.543.532.521.5ibm010.20.40.60.815.554.543.532.521.5ibm030.20.40.60.815.554.543.532.521.5ibm050.20.40.60.811.11.081.061.041.0210.98图 2: placement quality isoline with respect to and to fair compare our global placement optimization method to ntuplace3s 15, our global placement results were fed into ntuplace3s(v12.03.26) 23 detail placement algorithm. and nal results are compared to ntuplace3(v12.03.26). thus the detail placement algorithms are the same. we conducted experiments based on iccad04 mix-size placement benchmarks 24, and no parameter tuning to specic benchmarks was employed. all experiments were performed on the same linux pc with two intel i5-2410m 2.3ghz cpus and 2 gb memory.3.1 hpwl- aware placement comparisontable 1 shows the benchmark statistics and our placement results compared to the state- of-the-art 2d placer ntuplace3(v12.03.26). in the table, the column “di%” is the relative dierence of our results compared to ntuplace3, and “gp+dp cpu” represents the running time of global placement plus detail placement. the results show that our method obtains better hpwl on sixteen out of eighteen benchmarks than ntuplace3, with only 85% running time. our approach are 2.6% better in hpwl on average, and up to 4.3% better for 6 circuits.3.2 whitespace reservation comparisonthe most commonly used optimization objective in placement/lorplan is total wirelength. however, a placement with small wirelength may be unroutable due to high placement density in some subregions 4. density target is a constraint that forces a placer to preserve specied white space in any subregions of placement area. the density target is a oating number that is larger than or equal to chip density. for example, if density target is 0.7, any local region should be less than or equal to 70% occupied. placement that not meet this constraint would表 1: the hpwl (e5 ) and total runtime (seconds) comparisons with ntuplace3(v12.03.26)on iccad04 ibm mixed-size benchmarkscircuitcellmacrohpwl(e5)gp+dp cpu(sec)np3oursdi(%)np3oursratioibm01 ibm02 ibm03 ibm04 ibm05 ibm06 ibm07 ibm08 ibm09 ibm10 ibm11 ibm12 ibm13 ibm14 ibm15 ibm16 ibm17 ibm181226019071225632692528146321544534850722528576789969779697888328514647416079418252218

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论