|
Creators: Pieter Abbeel, Adam Coates, Andrew Y. Ng, Stanford University. Autonomous helicopter flight represents a challenging control problem with high dimensional, asymmetric, noisy, nonlinear, non-minimum phase dynamics. Though helicopters are significantly harder to control than fixed-wing aircraft, they are uniquely suited to many applications requiring either low-speed flight or stable hovering. The control of autonomous helicopters thus provides an important and challenging testbed for learning and control algorithms. The competition environment simulates an XCell Tempest helicopter in the flight regime close to hover. The agent's objective is to hover the helicopter by manipulating four continuous control inputs based on a 12-dimensional state space. A few pictures of the helicopter have been included with the simulator. In the last few years, considerable progress has been made in finding good controllers for helicopters [Abbeel et al, 2006]. Other recent of successful autonomous helicopter flight are given in [Bagnell and Schneider, 2001], [Gavrilets et al., 2004], [La Civita et al., 2006], [Ng et al., 2004], [Ng et al., 2004], [Roberts et al., 2003], and[Saripalli et al., 2003]. The competition domain is based on a simulator created by Andrew Ng's group at Stanford and will be generalized. See the Rules Page for more information about the generalized evaluation paradigm. Technical DetailsObservation Space: 12 dimensional, countinuous valued - forward velocity
- sideways velocity (to the right)
- downward velocity
- helicopter x-coord position - desired x-coord position -- helicopter's x-axis points forward
- helicopter y-coord position - desired y-coord position -- helicopter's y-axis points to the right
- helicopter z-coord position - desired z-coord position -- helicopter's z-axis points down
- angular rate around helicopter's x axis
- angular rate around helicopter's y axis
- angular rate around helicopter's z axis
10,11,12. quaternion x,y,z entries Action Space: 4 dimensional, countinuous valued - longitudinal (front-back) cyclic pitch
- latitudinal (left-right) cyclic pitch
- main rotor collective pitch
- tail rotor collective pitch
Rewards: function of the 12 dimensional observation End Conditions: The simulator is set up to run for 6000 timesteps, and each simulation step is 0.1 seconds, thus giving runs of 10 minutes. (The simulator runs faster than realtime.) If it enters the terminal state before 6000 timesteps have been completed, a large negative reward is given, corresponding to getting the most negative reward achievable for the remaining time. Additional pictures, as well as further information about the helicopter specifications, can be found here. Note: the competition software will provide your agent with a task specification string that describes the basic inputs and outputs of the particular problem instance your agent is facing. For the competition, the ranges provided in task specification may not be tight; they provide a rough approximation of the actual observation and action ranges. More documentation of the the task specification string can be found here .
References [Andrew Ng et al, 2004] Andrew Y.Ng, H. Jin Kim, Michael Jordan, and Shankar Sastry (2004). Autnonomous helicopter flight via reinforcement learning. In NIPS16. [Abbeel et al, 2006] Abbeel, Coates, Quigley, Ng, An Application of Reinforcement Learning to Aerobatic Helicopter Flight, nips 19. [Bagnell and Schneider, 2001] J. Bagnell and J. Schneider. Autonomous helicopter control using reinforcement learning policy search methods. In International Conference on Robotics and Automation. IEEE,2001. [Gavrilets et al., 2004] V. Gavrilets, B. Mettler, and E. Feron. Human-inspired control logic for automated maneuvering of miniature helicopter. Journal of Guidance, Control, and Dynamics, 27(5):752–759, 2004. [La Civita et al., 2006] M. LaCivita, G. Papageorgiou, W. C. Messner, and T. Kanade. Design and flight testing of a high-bandwidth H∞ loop shaping controller for arobotic helicopter. Journal of Guidance, Control, and Dynamics, 29(2):485–494, March-April2006. [Ng et al., 2004] AndrewY. Ng, H. JinKim, Michael Jordan, and Shankar Sastry.Autonomous helicopter flight via reinforcementlearning. In NIPS 16, 2004. [Ng et al., 2004] A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, andE. Liang. Autonomous inverted helicopter flight via reinforcement learning. In Int’l Symposiumon Experimental Robotics, 2004. [Roberts et al., 2003] Jonathan M. Roberts, Peter I. Corke, and Gregg Buskey. Low-cost flight control system for a small autonomous helicopter. In IEEE Int’l Conf. on Robotics and Automation, 2003. [Saripalli et al., 2003] S. Saripalli, J. F. Montgomery, and G. S. Sukhatme. Visually-guided landing of anunmanned aerial vehicle. IEEE Transactions on Robotics and Autonomous Systems, 2003.
|