FPGA-basedfullypipelinedandhigh-qualityimagerotation.doc

上传人：伐*** IP属地：宁夏上传时间：2019-07-11 格式：DOC 页数：8 大小：422KB 积分：10.8 举报 版权申诉

FPGA-basedfullypipelinedandhigh-qualityimagerotation.doc_第2页

FPGA-basedfullypipelinedandhigh-qualityimagerotation.doc_第3页

FPGA-basedfullypipelinedandhigh-qualityimagerotation.doc_第4页

FPGA-basedfullypipelinedandhigh-qualityimagerotation.doc_第5页

已阅读5页，还剩3页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

精品论文fpga-based fully pipelined and high-quality image rotationgongchao tang, luxin yan(huazhong university of science and technology iprai)5abstract: this paper focuses on the low latency and the fully pipelined implementation of image rotation with memory reduction on fpgas board. the method we used is based on three-pass algorithm and cubic convolution interpolation. the difficulty of the work is the construction of the fully pipeline as it is hard to get the positional relationship of pixels without storing them after shearing. in this article, mechanisms on judging the row offset and column offset is presented to select pixels in the10same row or same column for interpolation in template, which solves the bottleneck of pipeline. finally, the results of rotation, summary of the memory consumption in fpga, and the comparison with existing method are presented.key words: :image rotation; three-pass algorithm; cubic convolution interpolation; fpga pipelinedimplementation150introductiondigital image rotation is widely used in the fields of military, aerospace and medical imaging. as we know, traditional image rotation is performed in 2-d space. since the coordinates of pixels after shearing are not integers, we must use 2-d interpolation, and an n-degree (n3) interpolation is needed for higher quality. all of this above makes it complicated and time-consuming. so20implementation of rotation by software will take too much time to calculate and cant be up to real-time. as implemented on hardware, the trade-off between quality of the image and complexity of the approach should be considered. the author1 implemented image rotation in dsp, and test some methods of interpolation, but the throughput is only 25 frames per second for video images, large number of memory is needed, and the calculation speed limits the process of25high-quality request. so for higher speed, there is need to introduce fpga. as implemented on fpgas board, the author is inclined to use external memory to store the temporary results for pixels reorientation2, which wastes much time on reading, writing data, and influences on the efficiency of fpga. the three-pass algorithm makes whole rotation be decomposed in an appropriate sequence of 1-d signal translations. to assure the quality of the result, we use cubic30convolution interpolation after every shearing, which has much better performance on quality than most other interpolations and is simplified to a 1-d approach by three-pass. in this paper, a fully pipelined implementation of rotation based on three-pass algorithm and cubic convolution interpolation is proposed, which need only store 7 rows of image, and only significative pixels are processed in the flow. the delay of the system could be ignored.351algorithm of image rotationthe basic 2-d image rotation operation can be described using the following equations:- 8 - x cosq-sinq x = (1) y sin qcosq y thus, a gray-level pixel at position (x, y) in the original image is mapped to the position (x, y) in the destination image following a rotation of angular magnitude .brief author introduction:gongchao tang,male,master degree,ic designcorrespondance author: luxing yan,1978,male,associate professor,ic and computer. e-mail:40the 2-d formula above can be decomposed to3 4: x 1 = -tan q 10 12 -tan q x 2 (2) y 01 sinq1 01 y each elementary transformation in (2) corresponds to a shearing of the image in the x or y direction. the processing is one dimensional in nature since each row is simply translated by anoffset dx = - y tan q ，or each column is translated by an offset y=xsin.we limit -45245 ,45 because we can first rotate the image by 90,180 or 270 without any loss when the angle is beyond the range-45 ,455.as we use cubic convolution interpolation after shearing, the corresponding 2-d interpolationformulas for computing the value are of the form 6:f ( x, y) = f ( x i , y i ) s( x - x i ) s( y - y i )50f ( x i , y i ) is the gray-level value of ( x i , y i ) ，after shearing whilex i and(3)y i are decimal,and f ( x i , y i )is the new value of interpolation on integral coordinate (x,y).the cubic interpolation function s( ) is given by （6）: 3 v 3 - 5 v 2 + 10 v 1 22s(v) = - 1 v 3 + 5 v 2 - 4 v + 21 v 2(4) 220v 2corresponding to the rotation matrixs decomposability, (3) can be simplified to:55f ( x, y) = f ( x i , y j ) s( x - x i )(5)or f ( x, y) = f ( x i , y j ) s( y - y j )(6)and from (4) we find that the interpolation needs 4 pixels in the same row or in the same column at a time. in reverse, as long as we get 4 pixels in the same row or column, pixel(x, y) must be on integral coordinate between the second pixel and the third pixel.60here we give a function about :0counter-clockwise rotatedsign(q ) = 1clockwise rotatedfigure 1. architecture view of design2implementation on fpgas board65the architecture of design is shown in figure 1, the whole method consists of 3 shearing, module as coordinate calculate, offset calculate, and interpolation contain only simple calculations which can be easily implemented in fpga. as we do not use memory to store results of every shearing, the data flow may be not arranged by coordinates (especially in the 3rd shearing), bottleneck of the pipeline will be the judgment of pixels positions used for interpolation. here we give a70mechanism, when a pixel flows in, we can affirm its neighbor (in row or column) within one clock cycle. the judging details are different in every shearing. the shapes of image after shearing areshowed in figure 2.75a. first shearingfigure 2. results after shearingthe structure of the 1st shearing is in figure 3. pixels flow through registers a4, a3, a2, and a1 so that 4 adjacent pixels in the same row are selected to finish first interpolation. obviously the 4 pixels will be on the coordinates of decimal fraction after 1st shearing, they are used to calculate the pixel whose coordinate is integer between a2 and a3.80module of edge judgment makes us build the right template at the edge of image, we calculate the coordinates of the original image by module of coordinate calculate 1, with which we calculate the row offset and value of the cubic interpolation function s(). the row offset is used to calculate the new pixels integral coordinate between a2 and a3 after 1st shearing, and value of the cubic interpolation function with a4, a3, a2, a1 will help to calculate the value of the new pixel.85b. second shearingfigure 3. structure of the 1st shearingthe structure of the 2nd shearing is in figure 4, the image size is mn. when result image of1st shearing flows through registers a41, a42, a43, a12, a13 and the 3 fifos, we build up a90template(a32 as the center) showed in figure 5 referring to figure 2(2).here we give a new module“row offset judge” to find 4 adjacent pixels in the same column in the template.n11n12n13a21a22a23fifo3(depth=m-3)a32fifo2(depth=m-3)image from 1st shearinga41a42a43fifo1(depth=m-3)edge judgementrow offset judgesina4a3a2xn32, yn32column offset calculatea1interpolations|y-yi| calculateresultcoordinate caculatecoordinate of resultfigure 4. structure of the 2nd shearingwe know that the offset of row in the 1st shearing isqdx = - y tan q ,which means the2q95difference between offsets of any adjacent two rows is.as the range, we have max(tan)22=tan22.5=0.414, and 2tan22.5=0.818. so the difference among offsets of any adjacent three rows is no more than 1 pixel. we conclude that a row must shift 1 pixel or hold still relative to its neighbor in the 1st shearing.100105110115figure 5. template in the 2nd shearingaccording to the analysis given above, for selecting 4 pixels a1, a2, a3, a4 in the same column, we judge row offsets in the 1st shearing. and without question, a3=a32.take a1 for example.if offset_row3= =offset_row1, which means row3 and row1 hold still relatively in the 1st shearing, so a32 and a12 are still in the same column, we could take a1=a12.else if offset_row3!=offset_row1, which means row1 shifts 1 pixel relative to row3. for more, if sign()=0, row1 shift 1pixel left, so after 1st shearing, a13 and a32 are in the same column, then a1=a13; or a1=a11.following the rule of selection, a2 and a4 are given.a2 = (offset_row3= =offset_row2) ? a22 : (sign() ? a21 : a23 );a4 = (offset_row3= =offset_row4) ? a42 : (sign() ? a43 : a41 );c. third shearingthe flow of the 3rd shearing is similar to 2nd shearing, but more complicated at selecting a1, a2, a3, and a4 which are adjacent and in the same row. following figure 2(3), figure 6 shows thetemplate (a33 as the center) built in the 3rd shearing.figure 6. template built in the 3rd shearinghere we use the row offset in the 1st shearing and the column offset in the 2nd shearing to120find a1, a2, a3, a4. as max(sin)=0.7071 at the range any adjacent two rows is not more than 1 pixel.-45o q 45o , the difference between125130135140we take a33 as the center of the template. first we assign the third pixel a3 = a33, and selections of the left 3 pixels are based on the offset of row or column they are in compared to a33.we take selection of a1 as example.1) if offset_column3= =offset_column1, which means column 1 and column3 in figure 6 hold still relatively in the 2nd shearing, as in the 1st shearing there is only row shift, so a31 is in the same row as a33 all the time, and obviously, a31 is the right choice for a1.2) else if |offset_column3-offset_ column1| = =1, which means column 1 move 1 pixel relative to column 3, up or down, so the choice of a1 must be from row2 or row4.if sign()=0, the image is counter-clockwise rotated, and column 1 must be shifted down relative to column3, so pixels of row2 is the right choice. if offset_row3= =offset_row2, row3 and row2 hold still relatively in the 1st shearing, so a21 moves to a31 relatively after the 2nd shearing, a1=a21. if offset_row3!=offset_row2, a22 moves to a21 relative to a33 in the 1st shearing, and then moves to a31 in the 2nd shearing, then we get a1= a22.and when sign()=1, the image is clockwise rotated, and column 1 must be shifted up relative to column3, so pixels of row4 is the right choice for a1. the complete judgment in this condition is similar to the condition of sign()=0.3) else if |offset_column3-offset_ column1|= =2, which means column 1 shift 2 pixels relativeto column 3, up or down, so the choice of a1 must be from row1 or row5, and the complete judgment is similar to 2)based on the rule of selection, we select a2 following this:if (offset_column3= =offset_column2 ) a2=a32;else if (offset_column3!=offset_column2 & sign()= =0 )145if ( offset_row3= =offset_row2 ) a2=a22;else if ( offset_row3!=offset_row2 ) a2=a23;else if (offset_column3!=offset_column2 & sign()= =1 )150if ( offset_row3= =offset_row4 ) a2=a42;else if ( offset_row3!=offset_row2 ) a2=a43;and method for getting a4 is similar to, or even can be seen the same as method of a2.values of sin and tan q are stored as a look-up table, with 0.01o resolution. so we can2155160get the value of trigonometric function within 1 clock cycle.3results and discussionwe emulate the method with 256 256 8bit image in modelsim, the processing time and memory consumption are summarized in table 1. it can be seen that the proposed method costs less time and memory bits. the method is validated in a pci card, the pixel frequency is 30mhz, and the fpga is ep1s20 of stratix series. for the 128 128 8bit image, the delay of processing is only about 0.014ms. it takes about 0.56ms for the whole rotation. the total memory165consumption is 7168bits, which means we need only store pixels of 7 rows in all. the processing time and memory requirement is only related to the size of image and the pixel frequency (see table 2). furthermore, let the image size m n k bit, pixel frequency f (mhz), the latency of every shearing is showed in table 3, from the table we conclude that the latency of the method is td=(38+m3)/f (us), and the whole processing time is t=(38+m3+mn)/f = (38+m(n+3) /f(us).table 1comparison of the 3 methodsmethodrotationtimeanglememoryconsumptionimage sizefrequency217.6ms452 512k 8bit256 2568 bit20mhzproposed method3.32ms457 256 8bit256 256 8 bit20mhztable 2statistics of image in different sizesimage sizelatency(ms)rotation time(ms)memory(kbits)frequency（mhz）2562568bit0.0161.3214505125128bit0.0315.282850102410248bit0.06221.045650170table 3stat . of latency (counted by clock cycles)template builtselection of pixelsrow or column offsetcalculateinterpolation1st shearing10362nd shearingm+13363rd shearing2m+15361751801851904conclusionin this paper, we have presented the implementation of image rotation, and validate the method by a 128128 image. the whole rotation is implemented without large-scale image stored, the fully pipelines latency is only pixe

人人文库> 全部分类> 专业文献 > 工程机械

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

FPGA-basedfullypipelinedandhigh-qualityimagerotation.doc

文档简介

温馨提示

最新文档

评论

FPGA-basedfullypipelinedandhigh-qualityimagerotation.doc

文档简介

温馨提示

最新文档

评论

相关文档