IROS2019国际学术会议论文集Online Planning for Autonomous Underwater Vehicles Pering Ination Gathering Tasks in Large Subsea Environments_第1页
IROS2019国际学术会议论文集Online Planning for Autonomous Underwater Vehicles Pering Ination Gathering Tasks in Large Subsea Environments_第2页
IROS2019国际学术会议论文集Online Planning for Autonomous Underwater Vehicles Pering Ination Gathering Tasks in Large Subsea Environments_第3页
IROS2019国际学术会议论文集Online Planning for Autonomous Underwater Vehicles Pering Ination Gathering Tasks in Large Subsea Environments_第4页
IROS2019国际学术会议论文集Online Planning for Autonomous Underwater Vehicles Pering Ination Gathering Tasks in Large Subsea Environments_第5页
已阅读5页,还剩3页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Online Planning for Autonomous Underwater Vehicles Performing Information Gathering Tasks in Large Subsea Environments Harun Yetkin, James McMahon, Nicholay Topin, Artur Wolek, Zachary Waters, and Daniel J. Stilwell AbstractWe present an anytime Monte Carlo tree search (MCTS) algorithm to generate r

2、eal-time, near-optimal search paths in large subsea environments. The MCTS planner contin- uously builds a tree of the search space until either the allowed time per move is reached or the budget constraint for the search mission is met. In order to improve the performance of the MCTS planner, we pr

3、opose a novel heuristic action selection policy to determine the value of a leaf node. The proposed heuristic is tailored to problems where making a turn incurs a higher cost than moving straight, such as the case on autonomous underwater vehicles. Through extensive simulations, we show that our heu

4、ristic yields a signifi cant performance improvement over a lawnmover path planner a commonly employed approach in subsea search applications and over a simple MCTS planner where actions are selected uniformly at random. In our numerical illustrations, we use a real data set abstracted from sonar me

5、asurements acquired from the Boston Harbor. I. INTRODUCTION This paper addresses the computational challenge of com- puting near-optimal real-time search paths in subsea environ- ments when the search environment is very large and real- time calculation of optimal paths is infeasible. Our goal is to

6、 fi nd an unknown number of targets distributed in a bounded search domain. We use a decision-theoretic cost function which considers the performance of the search sensor to compute the value of searching a location. After computing the search value for each location, we cast the problem as the well

7、-known orienteering problem where the search vehicle collects a certain reward for visiting a specifi c location and the goal is to maximize the collected reward within limited time/distance the vehicle can travel. In order to generate the search paths in real-time, we employ Monte Carlo tree search

8、 (MCTS). While MCTS is a standard approach in many robotic applications, it is less often employed in subsea search problems since its performance can be very poor over long time horizons. To this end, the main contribution of this paper is to propose a novel MCTS heuristic policy that is tailored s

9、pecifi cally to vehicles performing synthetic aperture *This work was supported by the Offi ce of Naval Research via grants N00014-16-1-2092, N00014-18-1-2627, and N00014-19-1-2194 corresponding author, contributed equally N. Topin is with the Carnegie Mellon University School of Computer Science, P

10、ittsburg, PA, USA A. Wolek is with the University of Maryland, College Park, MD, USA J. McMahon and Z. Waters are with the US Naval Research Laboratory, Code 7130, Washington DC, USA H. Yetkin is an affi liate of the Center for Marine Autonomy and Robotics at Virginia Tech, and on the faculty of the

11、 Department of Mechatronics Engi- neering, Bartn University, Turkey, (, .tr) D. Stilwell is with the Bradley Department of Electrical and Computer Engineering, Virginia Tech, Blacksburg, VA, USA sonar processing, such as in subsea applications. However, with slight modi

12、fi cation, the proposed MCTS heuristic policy can be applied to a wide-range of search applications that include airborne search and rescue missions. This study builds on our prior work reported in 1 and 2. We consider the search problem previously defi ned in 1, an autonomous search vehicle is task

13、ed with fi nding an unknown number of stationary targets randomly distributed in a bounded search domain within a time or distance constraint. We assume the search vehicle is equipped with a search sensor that detects the number of targets in a location and an environment characterization sensor tha

14、t detects the environmental conditions at that location, and both sensors operate simultaneously. We also assume that the performance of the search sensor is affected by spatial variation of the environment. That is, in some locations the search sensor per- forms better than in other locations. Our

15、decision-theoretic cost function, as in prior work, accounts for false alarms, missed detections and environmental uncertainty. In our prior work 2, we address the computational challenge of computing near-optimal search paths and de- velop an approximate solver based on a special case of the travel

16、ing salesman problem. The approximate solver yields comparable results with the exact solution while it requires signifi cantly less computational power. However, in 2, we consider that the search sensor has the typical characteristics of a side-scan sonar, and thus, the vehicle is required to trave

17、l in straight lines across the entirety of the map. In this study, we extend our prior work by allowing the vehicle to make turns within the map as opposed to taking parallel straight lines (only). We note that this is not a small extension due to the substantial increase in problem complexity, and

18、the approach reported herein is fundamentally different than our prior work. To account for turns within the map we add a penalty to the cost function each time a turn action is taken. The numeric value of the penalty is computed off-line by considering the additional time required to maneuver to an

19、 adjacent cell assuming a kinematic vehicle model. That is, either the vehicle goes straight or makes a turn to move into a more promising part of the search area at the expense of paying a penalty for making a turn. While this theoretically results in a greater or no worse performance than the case

20、 where the vehicle is required to travel in a straight line, the described problem has a drastically larger state space that requires signifi cantly greater computational power. In order to meet the computational requirements of the problem, we employ an MCTS planner. Often, MCTS planners consider a

21、 random 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) Macau, China, November 4-8, 2019 978-1-7281-4003-2/19/$31.00 2019 IEEE6354 walk of the discretized search space to determine the value of a leaf node. However, in our search problem, random walk can result in poo

22、r search performance since the turn penalties are accounted for and accrue signifi cant cost over large planning horizons. Thus, in this study, we propose a novel heuristic, referred to as the RANDOM-LINES heuristic, to determine the value of a leaf node. The RANDOM- LINES heuristic aims to minimize

23、 the number of turns while avoiding searching the segments with considerably less reward. We test the effi cacy of our heuristic through extensive numerical simulations where search paths are planned using sonar data previously acquired from Boston Harbor. The organization of this paper is as follow

24、s. In Sec- tion II, we provide a literature review of similar problems. In Section III, the search problem is introduced and the value of searching a location is rigorously formulated. In Section IV, we present an anytime planning algorithm for informative path planning for subsea search and present

25、 our novel heuristic. Section V provides simulated results. We provide concluding remarks in Section VI. II. RELATEDWORK As the development of the unmanned platform and sensing technology has proceeded, the need for more sophisticated autonomous frameworks has grown. One main challenge that has not

26、yet been fully addressed is the development of a deliberative planner that can be applied in large environ- ments. Typically, a single autonomous agent will need to cover areas on the order of several square kilometers or more. When formulated as a search problem over a gridded space with a discrete

27、 set of possible actions and observations, the problem has been shown to be NP-Hard 3 and as such has been a major area of research in recent literature, generally described as informative path planning (IPP) problem. Recently, to generate high-quality paths in real-time on- board unmanned vehicles,

28、 state-of-the-art systems have em- ployed a variety of approaches. Applying branch-and-bound (BnB) algorithms has had varying degrees of success. In 4, BnB was used to fi nd optimal paths through exhaustive search for fi nite time horizons. BnB allows for the discovery of optimal solutions. However,

29、 in order to quickly obtain the optimal solutions, the search must be constrained with tight upper and lower bounds which requires domain-specifi c bounds that are often hard to defi ne. In 2, BnB was used on an abstraction of the planning space by taking advantage of motion constraints in the data

30、collection process. While BnB has shown utility in IPP, the effi cacy of such approaches rapidly deteriorates as the problem space increases. Recent work, 57, shows the advantages of using stochastic sampling in the planning process. In 5, IPP is decomposed into both a constraint satisfaction proble

31、m and a traveling salesman problem, where the set of locations are modifi ed stochastically and the TSP solver is invoked each time to search for better paths. In 6, sampling-based motion planning is used to generate a tree of vehicle trajectories. At each vertex of the motion tree, the expected rew

32、ard is the maximum likelihood estimate. In 7, a hybrid approach between sampling-based motion planning and discrete search is employed. Similar to 6 a tree of vehicle trajectories is grown. However, instead of computing the expected reward with each expansion, a discrete solver is used to guide the

33、expansion of the motion tree towards regions based on their expected reward. These approaches are generally not applicable to the problem investigated in this paper where there are a large number of inspections that need to occur with very low variability between expected rewards. Other recent work

34、has explored the use of Monte Carlo tree search (MCTS) for the anytime generation of plans given large state spaces (see 8 for a comprehensive survey). While MCTS has been mostly applied in combinatorial games such as the popular board games Go, 9, and Line of Actions (LOA), 10, it is also applied t

35、o a wide range of non-game applications. In 11, MCTS is implemented in a Mars rover domain. The authors use MCTS in a straightforward manner over a Bayesian Network (BN) which allows the planner to plan scientifi cally valuable paths for the rover to investigate. In 12, MCTS is used to generate info

36、rmative paths for a glider in the presence of air thermals. The problem studied in 12 considers both information gathering as well as energy replenishment. In this paper, we present an MCTS planner and propose a novel heuristic to improve its performance. Our method yields a substantial improvement

37、over a simple MCTS plan- ner where actions at each simulation are randomly selected. Our results inform single-agent deterministic problems with real-valued rewards, such as orienteering problems. We are particularly interested in subsea applications where any move other than moving straight incurs

38、a penalty. This study extends our prior work reported in 1 and 2. III. PROBLEMFORMULATION In this work we are given an a priori discrete map of the environment. Provided this map, the objective is to search the environment for potential targets while simultaneously performing environmental character

39、ization given a limited time horizon. By performing search and characterization, we are able to update the environmental map in situ so that the search vehicle can more effectively adapt its search strategy to the environment. To execute this, the approach requires a model of the sensor to compute t

40、he value of searching a location. Once the value of searching each location is computed, the problem then becomes the NP-Hard IPP where the search vehicle must decide which locations to search within the limited time budget. We follow the formulation presented in 2 and provide an overview in Section

41、 III-A. We note that the fi ndings of Section III-A are not novel and they are presented here for completeness. In Section III-B, we provide additional details specifi c to this study. A. Subsea Search and Characterization Search theory is concerned with fi nding an optimal allo- cation of available

42、 search effort to locate a lost or hidden target, such that a reward specifi ed as a measure of search effectiveness is maximized. Bernard Koopman offered one of 6355 Fig. 1:A subsea search environment showing an unpublished data set from the Boston Harbor taken in 2016 (see 2 for details). The x an

43、d y axis represent unit-less dimensions corresponding to the number of cells in the image (61 61). Cell-wise rewards are normalized to 1. the fi rst systematic approaches to the search problem 13. His work laid the foundations of the problem and since then the search problem received a lot of attent

44、ion, mainly from the operations research community. Some notable example include 1417. In a realistic search problem, there are certain limitations on performing search at a location such as imperfect sensor measurements or uncertain knowledge of the environment at that location. Noisy sensor measur

45、ements often include missed detections and false alarms. Local environmental conditions can affect sensor performance, the probability of detection, and the probability of false alarm. Although there is an extensive literature on search theory (see e.g., 18, 19), the challenges due to false alarms a

46、re seldom ad- dressed. Exceptions include 14, 16, 17, 2022, but uncertainty in environmental conditions is not accounted for in these studies. In our prior work 2, the objective function we proposed for assessing the value of searching a location accounts for multiple targets, false alarms, and unce

47、rtainty in the environment. 1) Environment Model: The search environment is ex- pressed as a grid G R2with nrrows and nccolumns. With each cell, we associate random variables X and E where X is the number of targets, and E represents the environmental conditions. We presume for cells i and j 6= i, X

48、iis independent of Xjand Eiis independent of Ej. We assume the search sensor performance is dependent on the environment and the possible set of environmental conditions,b1,b2,.,bm. There is an a priori distribution on the environment type. Specifi cally, for the ithcell of the grid, gi G, the distr

49、ibution is expressed as i= p1(i),p2(i),.,pm(i) where pj(i) = P(Ei= bj) is the probability that the environment in the cell is bj. Visiting a location yields both a noisy measurement of the number of targets z Z and the environmental conditions y Y at that location. We consider that the sensor models

50、 P(z | x,bj) and P(y | bj) are known. The former represents the likelihood of observing z targets when there are x targets and true environment is bj, and the latter represents the like- lihood of observing environment y when true environment is bj . We omit the details of the specifi c sensor model

51、 that we use in this study, but the details can be found in prior work 2, 23. After acquiring the measurements z and y, we fi rst apply Bayes rule to update the probability distribution of the number of targets and the probability distribution of the environmental conditions, P ?x | z,b j ? P ?z | x

52、,b j ? P ?x | b j ? (1) P ?b j| y ? P ?y | b j ? P ?b j ? (2) and, we then compute the posterior distribution of the number of targets unconditioned on the environment. P ?x | z,y? = m X j=1 P ?x | z,b j ?P?b j| y ? (3) 2) Benefi t of Searching a Cell: We assess the value of searching a location thr

53、ough a decision-theoretic cost function and specifi cally consider that the benefi t of search- ing a cell is the expected reduction in Bayes risk due to measurements z and y where Bayes risk is associated with the inaccuracy of an estimate on the number of targets. Given the measured data z, we fi

54、rst defi ne the loss corresponding to the estimate (z) of the number of targets x L?x,(z)?= ci?x (z)?for i 1, 2(4) where c1 0 and c2 0 are relative costs of underestimat- ing (z) 0 is a tunable parameter. In (19), the denominator of the fi rst term accounts for the effect of the turn penalty and als

55、o the ratio of the mission length to grid size, the numerator of the second term accounts for the average of the mean reward and the maximum attainable reward in the grid, and the denominator of the second term accounts for the total reward along a line. Informally speaking, the average of the mean

56、reward and the maximum attainable reward in the grid represents an optimistic estimate of the reward the vehicle would acquire if it made a turn. Thus, making a turn is preferred if the next cells reward is adequately smaller than this estimate. There is a trade-off between the turn penalty and the

57、expected increase in the reward for making a turn, e.g., for a large turn penalty, the value of making a turn is smaller. We make a turn at the cell that maximizes the turn gain for the ith line along the direction d (i,d) = max j g(i,j,d).(20) Algorithm 3 shows the pseudo-code for the RANDOM- LINES

58、 heuristic. The fi rst step of the heuristic is to randomly select a survey direction from the cardinal directions (East, West, South and North) to survey the search area (Alg.3, line 2). Then, given the current line i, the current direction d, and the current location j along line i in direction d

59、(Alg.3, line 3), the heuristic selects the number of cells, denoted by n in the pseudo-code, that the vehicle will move forward before making a turn (Alg.3, lines 4-7). After moving along line i for n cells in direction d (Alg.3, line 8), the heuristic selects another line parallel to line i to sweep (Alg.3, line 10). With 0.5 probability we select the next parallel line, and with remaining 0.5 probability we select one of the remaining lines after taking a turn action a (Alg.3, line 18). Algorithm 3 Random Lines MonteCarlo Search 1:function HEURISTICPOLICY(s) 2:

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论