Practical Issues in Temporal Difference Learning

Abstract

This paper examines whether temporal difference methods for training connectionist networks, such as Suttons's TO(') algorithm, can be suc(cid:173) cessfully applied to complex real-world problems. A number of important practical issues are identified and discussed from a general theoretical per(cid:173) spective. These practical issues are then examined in the context of a case study in which TO(') is applied to learning the game of backgammon from the outcome of self-play. This is apparently the first application of this algorithm to a complex nontrivial task. It is found that, with zero knowledge built in, the network is able to learn from scratch to play the entire game at a fairly strong intermediate level of performance, which is clearly better than conventional commercial programs, and which in fact surpasses comparable networks trained on a massive human expert data set. The hidden units in these network have apparently discovered useful features, a longstanding goal of computer games research. Furthermore, when a set of hand-crafted features is added to the input representation, the resulting networks reach a near-expert level of performance, and have achieved good results against world-class human play.

Cite

Text

Tesauro. "Practical Issues in Temporal Difference Learning." Neural Information Processing Systems, 1991.

Markdown

[Tesauro. "Practical Issues in Temporal Difference Learning." Neural Information Processing Systems, 1991.](https://mlanthology.org/neurips/1991/tesauro1991neurips-practical/)

BibTeX

@inproceedings{tesauro1991neurips-practical,
  title     = {{Practical Issues in Temporal Difference Learning}},
  author    = {Tesauro, Gerald},
  booktitle = {Neural Information Processing Systems},
  year      = {1991},
  pages     = {259-266},
  url       = {https://mlanthology.org/neurips/1991/tesauro1991neurips-practical/}
}