|
Creator: Brian Tanner, University of Alberta Tetris is a falling-blocks puzzle video game originally designed and programmed by Alexey Pajitnov in 1985. In Tetris, a pseudorandom sequence of tetrominoes (sometimes called "tetrads" in older versions) - shapes composed of four square blocks each - fall down the playing field. The object of the game is to manipulate these tetrominoes by moving each one sideways and rotating it by 90 degree units, with the aim of creating a horizontal line of blocks without gaps. When such a line is created, it disappears, and the blocks above (if any) fall. The game ends when the player "tops out", that is, when the stack of tetrominoes reaches the top of the playing field and no new tetrominoes are able to enter. Despite its simple rules, playing tetris well requires a complex strategy and lots of experience. Furthermore, Demaine et al. have shown that Tetris is hard in a mathematical sense as well [Demaine et al., 2003]: finding the optimal strategy is NP-hard even if the sequence of tetrominoes is known in advance. These properties make Tetris an appealing benchmark problem for testing reinforcement learning (and other machine learning) algorithms. Tetris was first formalized as a Markov decision problem by Tsitsiklis and Van Roy (1996). They achieved surprising perfomance using a simple linear architecture and a small set of expert game features. There have been several studies on applying reinforcement learning methods to Tetris, but none have significantly outperformed Tsitsiklis and Van Roy's results (see [Ramon and Driessens, 2004], [Bertsekas and Tsitsiklis, 1996], [Lagoudakis et al., 2002], [Farias and van Roy, 2006] and [Kakade, 2001]). Recent work by Szita & Lorincz (2006) presented result from a cross entropy method that are several orders of magnitude better than Tsitsiklis and Van Roy. Reinforcement learning methods can learn to play tetris well, but can learning methods achieve better performance than Istvan Szita and Andras Lorincz or the best human players? The competition domain is based on the Van Roy (1995) specification of the Tetris problem. The main difference between Van Roy's specification and our version of Tetris is the action space. In the competition problem the agent chooses to rotate or move the peice one space, as it falls (like the video game). In Van Roy's specification the agent chooses a final position and orientation for each piece when it first appears. The competition version of tetris generalized. See the Rules Page for more information about the generalized evaluation paradigm. Technical Details Observation Space: high dimensional, discrete valued containing: - bit map: binary representation of the current board (row major order)
- bit vector: indentifying the falling piece
- two integers: board size, number of rows and columns
Action Space: 1 dimensional, discrete valued - manipulation of falling piece, eg: rotate and move left, right, down and drop
Rewards: positive function of rows eliminated on each step Note: the competition software will provide your agent with a task specification string that describes the basic inputs and outputs of the particular problem instance your agent is facing. For the competition, the ranges provided in task specification may not be tight; they provide a rough approximation of the actual observation and action ranges. More documentation of the the task specification string can be found here .
References [Bertsekas and Tsitsiklis, 1996] Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
[Demaine et al., 2003] Demaine, E. D., Hohenberger, S., and Liben-Nowell, D. (2003). Tetris is hard, even to approximate. In Proc. 9th International Computing and Combinatorics Conference (COCOON 2003), pages 351–363. [Farias and van Roy, 2006] Farias, V. F. and van Roy, B. (2006). Probabilistic and Randomized Methods for Design Under Uncertainty, chapter Tetris: A Study of Randomized Constraint Sampling. Springer-Verlag UK.
[Kakade, 2001] Kakade, S. (2001). A natural policy gradient. In Advances in Neural Information Processing Systems (NIPS 14), pages 1531–1538.
[Lagoudakis et al., 2002] Lagoudakis, M. G., Parr, R., and Littman, M. L. (2002). Least-squares methods in reinforcement learning for control. In SETN ’02: Proceedings of the Second Hel lenic Conference on AI, pages 249–260, London, UK. Springer-Verlag.
[Ramon and Driessens, 2004] Ramon, J. and Driessens, K. (2004). On the numeric stability of gaussian processes regression for relational reinforcement learning. In ICML-2004 Workshop on Relational Reinforcement Learning, pages 10–14. [Szita and Lorincz, 2006] Istvan Szita, András Lörincz: Learning Tetris Using the Noisy Cross-Entropy Method. Neural Computation 18(12): 2936-2941 (2006)
|