Temporal Difference Learning of Backgammon Strategy

Abstract

This paper presents a case study in which the TD(λ) algorithm for training connectionist networks, proposed in (Sutton, 1988), is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, networks are able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. The hidden units in these network have apparently discovered useful features, a longstanding goal of computer games research. Furthermore, when a set of handcrafted features is added to the input representation, the resulting networks reach a near-expert level of performance, and have achieved good results in tests against world-class human play.

Cite

Text

Tesauro. "Temporal Difference Learning of Backgammon Strategy." International Conference on Machine Learning, 1992. doi:10.1016/B978-1-55860-247-2.50063-2

Markdown

[Tesauro. "Temporal Difference Learning of Backgammon Strategy." International Conference on Machine Learning, 1992.](https://mlanthology.org/icml/1992/tesauro1992icml-temporal/) doi:10.1016/B978-1-55860-247-2.50063-2

BibTeX

@inproceedings{tesauro1992icml-temporal,
  title     = {{Temporal Difference Learning of Backgammon Strategy}},
  author    = {Tesauro, Gerald},
  booktitle = {International Conference on Machine Learning},
  year      = {1992},
  pages     = {451-457},
  doi       = {10.1016/B978-1-55860-247-2.50063-2},
  url       = {https://mlanthology.org/icml/1992/tesauro1992icml-temporal/}
}