|
Proving Evaluation Details |
|
|
These numbers may be subject to change if the participant community finds that they are too extensive and require too much time. Please voice concerns in the forums. About This PageBelow are the current details about the proving runs for the rl-competition. In all cases, the evaluation criteria is based on total cumulative reward over all steps on all MDPs. There is no separate exploration vs. explotation phase, no "free" learning time, and no "frozen" test time. How best to trade these things off are left to the competitors. We have chose to run with a step limit as opposed to an episode or time limit. This implies that the number of episodes or wall clock time can very drastically between competitors. For example, a poor tetris player may experience hundreds of thousands of episodes on an MDP (each episode is a few steps), while a good player may see only a few episodes (each episode will be many steps). Similalry, some agents may take hours to complete a proving run on a laptop computer, some may take days running on a computing cluster. These may be controversial choices, we will see. The leaderboard will report the best proving run done by each team, sorted by cumulative reward in descending order. When a shorter summary statistic is available, it may be reported instead. For example, in Mountain Car, reporting the number of episodes completed is as good as reporting the cumulative reward. Good luck. Mountain CarVital Statistics - Number of steps per MDP = 100 thousand
- Number of MDPs = 50
- Total number of steps per proving run = 5 million
- Estimated time to do proving run on commodity hardware using random (cheap) agent = 20-40 minutes
Helicopter HoveringVital Statistics- Number of steps per MDP = 6 million
- Number of MDPs = 15
- Total number of steps per proving run = 90 million
- Estimated time to do proving run on commodity hardware using random (cheap) agent = 8-12 hours
TetrisVital Statistics - Number of steps per MDP = 5 million
- Number of MDPs = 10
- Total number of steps per proving run = 50 million
- Estimated time to do proving run on commodity hardware using random (cheap) agent = 6-10 hours
Real Time Strategy
Vital Statistics - Number of steps per MDP = 37.5 million
- Number of MDPs = 1
- Total number of steps per proving run = 37.5 million
- Estimated time to do proving run on commodity hardware using random (cheap) agent = 15-20 hours
|