Proving Evaluation Details Print E-mail
These numbers may be subject to change if the participant community finds that they are too extensive and require too much time. Please voice concerns in the forums.

About This Page

Below are the current details about the proving runs for the rl-competition.  In all cases, the evaluation criteria is based on total cumulative reward over all steps on all MDPs.  There is no separate exploration vs. explotation phase, no "free" learning time, and no "frozen" test time.  How best to trade these things off are left to the competitors.

We have chose to run with a step limit as opposed to an episode or time limit.  This implies that the number of episodes or wall clock time can very drastically between competitors.  For example, a poor tetris player may experience hundreds of thousands of episodes on an MDP (each episode is a few steps), while a good player may see only a few episodes (each episode will be many steps). Similalry, some agents may take hours to complete a proving run on a laptop computer, some may take days running on a computing cluster.  These may be controversial choices, we will see. 

The leaderboard will report the best proving run done by each team, sorted by cumulative reward in descending order.  When a shorter summary statistic is available, it may be reported instead.  For example, in Mountain Car, reporting the number of episodes completed is as good as reporting the cumulative reward.

Good luck. 

 

Mountain Car

Vital Statistics

  • Number of steps per MDP =  100 thousand
  • Number of MDPs = 50
  • Total number of steps per proving run = 5 million
  • Estimated time to do proving run on commodity hardware using random (cheap) agent = 20-40 minutes

Helicopter Hovering

Vital Statistics

  • Number of steps per MDP =  6 million
  • Number of MDPs = 15
  • Total number of steps per proving run = 90 million
  • Estimated time to do proving run on commodity hardware using random (cheap) agent = 8-12 hours

Tetris

Vital Statistics 

  • Number of steps per MDP =  5 million
  • Number of MDPs = 10
  • Total number of steps per proving run = 50 million
  • Estimated time to do proving run on commodity hardware using random (cheap) agent = 6-10 hours

 

Real Time Strategy

 Vital Statistics 

  • Number of steps per MDP =  37.5 million
  • Number of MDPs = 1
  • Total number of steps per proving run = 37.5 million
  • Estimated time to do proving run on commodity hardware using random (cheap) agent = 15-20 hours 
 

Login to Message Boards

Separate username & password from team login.





Lost Password?
NOTE: Registration for message boards has been DISABLED because of SPAM. Please e-mail brian@rl-competition.org for an account.