Mountain Car Print E-mail
Creator: Adam White, University of Alberta

In the Mountain Car problem, an agent must drive an underpowered car up a steep mountain road. Since gravity is stronger than the car’s engine, even at full throttle the car cannot simply accelerate up the steep slope. The car’s movement is described by two continuous output variables, position and velocity, and one discrete input representing the acceleration of the car.  

Mountain Car is interesting because the car’s position on the hill and its velocity are real-valued. Therefore, a learning algorithm must use a function approximator to learn a good policy. Mountain car is also interesting because a successful control policy must drive the car backwards, up the other side of the valley, to gain enough momentum to drive forwards up the hill. This means the learning algorithm must move away from the goal, incurring additional negative reward, to discover the solution. Finally, actions do not have immediately measurable effects on the state of the system. Thus, learning algorithms must assign credit to actions taken several time steps in the past.

The Mountain Car task was originally proposed by Andrew Moore in his PhD disertation (1990) and has been widely studied since. Singh and Sutton (1996) later used Mountain Car in their work on eligibility traces and formalized the state update equations for the position and velocity of the car based on Moore's original problem specification. Over the years there have been several variations on Singh and Sutton's version of the problem: different reward functions, starting states and termination conditions. The competition domain is based the Mountain Car specification from Sutton & Barto's reinforcement learning book (1998) but will be generalized.  See the Rules Page for more information about the generalized evaluation paradigm.

The competition domain is based on Singh and Sutton's specification and will be generalized.  See the Rules Page for more information about the generalized evaluation paradigm.

 

Technical Details

Observation Space: 2 dimensional, continuous valued

  1. car position
  2. car velocity   

Action Space: 1 dimensional, discrete valued

  1. reverse, neutral and forward

Rewards: negative reward per step


Note: the competition software will provide your agent with a task specification string that describes the basic inputs and outputs of the particular problem instance your agent is facing. For the competition, the ranges provided in task specification may not be tight; they provide a rough approximation of the actual observation and action ranges. More documentation of the the task specification string can be found here .

 

References 

[Moore, 1990] A. Moore, Efficient Memory-Based Learning for Robot Control, PhD thesis, University of Cambridge, November 1990.

[Singh and Sutton, 1996] Singh, S.P. and Sutton, R.S. (1996) Reinforcement learning with replacing eligibility traces. Machine Learning 22(1/2/3):123-158.

[Sutton and Barto, 1998] Reinforcement Learning:. An Introduction. Richard S. Sutton and Andrew G. Barto. A Bradford Book. The MIT Press Cambridge, Massachusetts London, England, 1998

 

 

Polls

My team is most likely to compete in...
 

Login to Message Boards

Separate username & password from team login.





Lost Password?
NOTE: Registration for message boards has been DISABLED because of SPAM. Please e-mail brian@rl-competition.org for an account.