外文翻译--基于DSP的通过局部特征实时物体识别嵌入式系统_第1页
外文翻译--基于DSP的通过局部特征实时物体识别嵌入式系统_第2页
外文翻译--基于DSP的通过局部特征实时物体识别嵌入式系统_第3页
外文翻译--基于DSP的通过局部特征实时物体识别嵌入式系统_第4页
外文翻译--基于DSP的通过局部特征实时物体识别嵌入式系统_第5页
已阅读5页,还剩5页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、附录areal-time object recognition using local features on a dsp-based embedded systemabstractin the last few years, object recognition has become one of the most popular tasks in computer vision. in particular, this was driven by the development of new powerful algorithms for local appearance based ob

2、ject recognition. so-called smart cameras with enough power for decentralized image processing became more and more popular for all kinds of tasks, especially in the field of surveillance. recognition is a very important tool as the robust recognition of suspicious vehicles, persons or objects is a

3、matter of public safety. this simply makes the deployment of recognition capabilities on embedded platforms necessary. in our work we investigate the task of object recognition based on state-of-the-art algorithms in the context of a dsp-based embedded system. we implement several powerful algorithm

4、s for object recognition, namely an interest point detector together with an region descriptor, and build a medium-sized object database based on a vocabulary tree, which is suitable for our dedicated hardware setup. we carefully investigate the parameters of the algorithm with respect to the perfor

5、mance on the embedded platform. we show that state-of-the-art object recognition algorithms can be successfully deployed on nowadays smart cameras, even with strictly limited computational and memory resources.keywords dsp ; object recognition; local features; vocabulary tree1. introductionobject re

6、cognition is one of the most popular tasks in the field of computer vision. in the past decade, big efforts were made to build robust object recognition systems based on appearance features with local extent. for such a framework to be applicable in the real world several attributes are very importa

7、nt: insensitivity against rotation, illumination or view point changes, as well as real-time behavior and large-scale operation. current systems already have a lot of these properties and, though not all problems have been solved yet, nowadays they become more and more attractive to the industry for

8、 inclusion in products for the customer market. in turn, recently embedded vision platforms such as smart cameras have successfully emerged, however, only offering a limited amount of computational and memory resources. nevertheless, embedded vision systems are already present in our everyday life.

9、almost everyones mobile phone is equipped with a camera and, thus, can be treated as a small embedded vision system. clearly this gives rise to new applications, like navigation tools for visually impaired persons, or collaborative public monitoring using millions of artificial eyes. in addition, th

10、e low price of digital sensors and the increased need for security in public places has led to a tremendous growth in the number of cameras mounted for surveillance purposes. they have to be small in size and have to process the huge amounts of available data on site. furthermore, they have to perfo

11、rm dedicated operations automatically and without human interaction. not only in the field of surveillance, but also in the areas of household robotics, entertainment, military and industrial robotics, embedded computer vision platforms are becoming more and more popular due to their robustness agai

12、nst environmental adversities. especially dsp-based embedded platforms are very popular as they are powerful and cheap cpus, which are still small in size and efficient in terms of power consumption. as dsp offer the maximum in flexibility of the software to be run, compared to other embedded units

13、like fpg as, asic or gpu, their current success is not surprising.for the reasons already mentioned, recognition tasks are a very important area of research. however, in this respect some attributes of embedded platforms strictly limit the practicability of current state-of-the-art approaches. for e

14、xample, the amount of memory available on a device strictly limits the number of objects in the database. therefore, for building an embedded object recognition system, one goal is to make the amount of data to represent a single object as small as possible in order to maximize the number of recogni

15、zable objects. another important aspect is the real-time capability of these systems. algorithms have to be fast enough to be operational in the real world. they have to be robust and user-friendly; otherwise, a product equipped with such functionality is simply unattractive to a potential customer.

16、 for example, in an interactive tour through a museum, object recognition on a mobile device has to be fast enough to allow for continuity in guidance. formally speaking, we consider this to be an application requiring soft real-time system behavior. clearly, this is just one example, and the exact

17、meaning of the term real-time is dependent on the concrete application. we still consider an object recognition system as being real-time capable, if it is able to deliver at least one result per second. this already serves enough for many applications like the example of the interactive museum intr

18、oduced above. however, it is clear that this definition does not meet other applications, and that an improvement in throughput is needed for object recognition at frame rate, for instance in combination with object tracking. to summarize, building a full-featured recognition system on an embedded p

19、latform turns out to be a challenging problem given all the different aspects and environmental restrictions to consider.in this work, we describe a method to deploy a medium sized object recognition system on a prototypical dsp based embedded platform. to the best of our knowledge, we are the first

20、 to extensively investigate issues related to object recognition in the context of embedded systems; by now this is the only work studying the influence of various parameters on recognition performance and runtime behavior. we pick a set of high-level algorithms to describe objects by a set of appea

21、rance features. as a prototypical local feature based recognition system we use difference of gaussian (dog) key points and principal component analysis scale invariant feature transform (pcasift) descriptors to build compact object representations. by arranging this information in a clever treelike

22、 data structure based on k-means clustering, a so-called vocabulary tree, real-time behavior is achieved. by applying a dedicated compression mechanism, the size of the data structure can be traded off against the recognition performance and thereby accurate tuning the properties of a recognition sy

23、stem to a given hardware platform can be performed. as it is shown in extensive evaluations by considering both, special properties of the algorithms and dedicated advantages of special hardware, considerable gains in recognition performance and throughput can be achieved.the remainder of this paper

24、 is structured as follows. in sect. 2 we give an overview about developments in both areas that we are bringing together in our work. on the one hand we list a number of references in the context of object recognition by computer vision; on the other hand, we cite a number of publications from the a

25、rea of embedded smart sensors. a detailed description of the methods involved in building our object recognition algorithm is given in part 3. in sect. 4 we outline our framework and give details about training and implementation of our system. we closely describe all steps in designing our approach

26、 and give side notes on alternative methods. in sect. 5, we experimentally evaluate our system on a challenging object database and discuss real time and real-world issues. furthermore, we investigate some special features of our approach and elucidate the dependencies of several parameters on the o

27、verall system performance. the work concludes with some final notes and an outlook on future work in sect. 6.2. related workin the following we will give a short introduction to the topic of local feature based object recognition. due to the huge amount of literature available, we will focus on the

28、most promising approaches using local features, and refer to those algorithms which are somehow related to our work. we will also give a short overview about object recognition in the context of embedded systems, which, due to the sparseness of existing approaches, contain both global and local meth

29、ods, as well as algorithms implement on fpga and dsp-based platforms.local-appearance based visual object recognition became popular after the development of powerful interest region detectors and descriptors. early full-featured object recognition systems dealing with all the individual algorithmic

30、 steps and their related problems were proposed by schmid and mohr, and schiele and crowley . the main idea behind local feature based object recognition is maintaining object representations from collections of locally sampled descriptions. in other words, the appearance of local parts of a single

31、object is encoded in descriptors, and a set of these descriptors forms the final object representation. for finding the distinguishable regions, so-called interest region detectors are used, which find regions or points of special visual distinctiveness. the neighborhood of such regions is subsequen

32、tly encoded using a special transform to build a description inherently providing several desirable properties. beside insensitivity against illumination changes and partial viewpoint invariance, representations as sets of local descriptors offer robustness against background clutter and partial occ

33、lusions. needless to say that a so called bag of descriptors representation can be built using one single or several combinations of different detectors and descriptors.the collectivity of all descriptors from multiple objects(i.e., bags of descriptors) is used to build a database. given this databa

34、se and a new representation of an object to be recognized, correspondences are counted into a voting scheme to determine the correct match. determining these correspondences is a complex task. descriptors are high dimensional feature vectors and matching a query descriptor means determining the exac

35、t nearest neighbors in the database. unfortunately, by now, no algorithms are known that can determine the exact nearest neighbor of a point in high-dimensional spaces that are any more efficient than exhaustive search. due to the large amount of objects, and the large amount of local descriptors, r

36、espectively, this type of information management is unwieldy and inefficient. thus, a number of different methods to approximate the solution in an efficient way have been proposed to keep the performance of an overall object recognition system manageable.the basic principle of interest points and r

37、egions is the search for spots and areas in an image which exhibit a predefined property making them special in relation to their local neighborhood. this property should make the region distinguishable from its neighborhood and detectable repeatedly. furthermore, the detection of these features sho

38、uld beto the best possibleillumination and viewpoint invariant.the first important interest point detector, the so-called harris corner detector, was proposed in 1988 by harris and stephens. it exhibits excellent repeatability and was subsequently used for object recognition purposes by schmid and m

39、ohr. an extension to the harris detector to include scale information was later reported by mikolajczyk and schmid as harrislaplace detector and was used by schaffalitzky and zisserman formulti-view matching of unordered image sets. another approach to detect blob-like image structure is to search p

40、oints where the determinant of the hessian matrix assumes a local extreme um, which is called the hessian detector. further developments to include affine covariance resulted in the harrisaffine and hessianaffine detectors proposed by mikolajczyk, mikolajczyk and schmid.the currently most popular tw

41、o-part approach known as scale invariant feature transform (sift) was proposed by lowe, where the first part is an interest point detector. the dog detector takes the differences of gaussian blurred images as an approximation of the scale normalized laplacian and uses the local maximum of the respon

42、ses in scale space as an indicator for a keypoint. a complementary feature detector, the maximally stable extremal regions (mser) detector, was proposed by matas et al. in short, the mser detector searches for regions which are brighter or darker than their surroundings, i.e., are surrounded by dark

43、er, vice-versa brighter pixels. first, pixels are sorted in ascending or descending order of their intensity value, depending on the region type to be detected. the pixel array is sequentially fed into a union-find algorithm and a tree-like shaped data structure is maintained, whereas the nodes cont

44、ain information about pixel neighborhoods, as well as information about intensity value relationships. finally, nodes which satisfy a set of predefined criteria are sought by a tree-traversing algorithm.two affine covariant region detectors were proposed by tuytelaars and van gool, intensity-based r

45、egions (ibr) and edge-based regions (ebr). ibrs are based onextrema in intensity. given a local intensity extremum, the brightness function along rays emanating from the extremum is studied. this function itself exhibits an extremum at locations where the image intensity suddenly changes. linking al

46、l points of the emanating rays corresponding to this extremum forms and ibr. ebrs are determined from corner points and edges nearby. given a single corner point and walking along the edges in opposite directions with two more control points, a one-dimensional class of parallelograms is introduced u

47、sing the corner itself and the vectors pointing from the corner to the control points. studying a function of texture and using additional constraints, a single parallelogram is selected to be an ebr.another algorithm, termed salient region detector was proposed by kadir et al. and is based on the p

48、robability density function (pdf) of intensity values computed over an elliptical region. for each pixel, the entropy extrema for an ellipse centered at this pixel is recorded over the ellipse parameters orientation, h, scale s and the ratio of major to minor axis k. from a sorted list of all region

49、 candidates the n most salient ones are chosen. for an extensive evaluation of a large number of affine region detectors refer to the work of.generally speaking, a descriptor is an abstract characterization of an image patch. usually, the image patch is chosen to be the local environment of an inter

50、est region. based on various algorithms methods or transformations, the resulting character can be made rotation invariant or, at least partially, insensitive to affine transformations.most approaches are based on gradient calculations or image brightness values. as a second part of the sift approac

51、h, lowe proposed the use of descriptors based on stacked gradient histograms. the single histograms are calculated in a subdivided patch describe the gradient orientation in order to cover spatial information. finally, they are concatenated to form a 128-dimensional descriptor. recently ke and sukth

52、ankar, proposed the so called pcasift descriptor based on eigenspace analysis. they calculated a principal component analysis (pca) eigenspace on the gradient images of a representative number of over 20,000 image patches. the descriptor of a new image tile is generated by projecting the gradients o

53、f the tile onto the precalculated eigenspace, keeping only the d most significant eigenvectors. thus, an efficient compression in descriptor dimensionality is achieved, coevally keeping the performance at a rate comparable to the original sift descriptor. closely related to the sift approach, the gr

54、adient location and orientation histogram (gloh) descriptor was proposed by mikolajczyk and schmid. opposed to sift gradient histograms are calculated on a finer circular rather than on a coarser rectangular grid, which results in a 272-dimensional histogram. pca is subsequently used to reduce the d

55、escriptor dimensionality to 128 again. two rotation invariant descriptors were proposed by lazebnik et al, the rotation-invariant feature transform (rift) and the spinimage descriptors. the rift descriptor is calculated on a circular normalized patch which is divided into concentric rings of equal w

56、idth. within each ring, the gradient orientation histogram is computed while the gradient direction is calculated relative to the direction of the vector pointing outward from the center. the spin-image is a two-dimensional histogram encoding the distribution of image brightness values in the neighb

57、orhood of a particular center point. the histogram has two dimensions, namely the distance from the center point and the intensity value. quantizing the distance, the value of a bin corresponds to the histogram of the intensity values of pixels located at a fixed distance from the center point.附录b基于

58、dsp的通过局部特征实时物体识别嵌入式系统摘要在过去几年中,对象识别已经成为最热门的任务,计算机视觉尤其是,这是推动发展新的强大的算法,局部特征的物体识别。所谓智能相机有足够的权力分散的图像处理变得越来越流行的各种任务,特别是在外地的监视。它是一个非常重要的工具,强大的识别可疑车辆,人员或物体是否符合公众安全。这只是局部识别功能的嵌入式平台的基本功能。在我们的工作中,我们调查的任务是,目标识别基于状态最先进的算法,在一个基于dsp的嵌入式系统。我们执行一些功能强大的算法识别物体,即有兴趣点探测连同区域描述,并建立一个中型对象数据库为基础的词汇树,这是适合我们的专用硬件设置。我们仔细研究了该算法

59、参数性能的嵌入式平台。我们所研究的,国家最先进的目标识别算法,可以成功地部署在当今智能相机,即使计算和内存资源有严格的限制。关键词 数字信号处理;物体识别;本地功能;词汇树;1. 介绍识别物体是一个最流行的任务领域中的计算机问题。在过去十年中,大量科学工作者做出努力,建立强有力的目标识别系统的外观特征与局部特征的程度。对于这样一个框架,以适用于现实世界中的几个属性是非常重要的:对旋转不敏感,光照或观点的变化,以及实时的行为和大规模行动。目前的系统已经有很多这些属性,虽然不是所有的问题已经解决,但如今他们变得越来越有吸引力的行业列入产品的客户市场。反过来,最近嵌入式视觉平台,如智能相机已经成功地出现了,不过,只有提供数量有限的计算和内存资源。然而,嵌入式视觉系统已经在我们的日常生活中。几乎每个人的手机配备了摄像头,因此可以被视为一个小型的嵌入式视觉系统。显然,这会引起新的应用程序,如导航工具,视障人士,或协作公众监督使用数以百万计的人造眼睛。此外,低价格的数字传感器和需要增加

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论