Why Did TD-Gammon Work?

Abstract

Although TD-Gammon is one of the major successes in machine learn(cid:173) ing, it has not led to similar impressive breakthroughs in temporal dif(cid:173) ference learning for other applications or even other games. We were able to replicate some of the success of TD-Gammon, developing a competitive evaluation function on a 4000 parameter feed-forward neu(cid:173) ral network, without using back-propagation, reinforcement or temporal difference learning methods. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest that the surprising success of Tesauro's program had more to do with the co-evolutionary structure of the learning task and the dynamics of the backgammon game itself.

Cite

Text

Pollack and Blair. "Why Did TD-Gammon Work?." Neural Information Processing Systems, 1996.

Markdown

[Pollack and Blair. "Why Did TD-Gammon Work?." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/pollack1996neurips-tdgammon/)

BibTeX

@inproceedings{pollack1996neurips-tdgammon,
  title     = {{Why Did TD-Gammon Work?}},
  author    = {Pollack, Jordan B. and Blair, Alan D.},
  booktitle = {Neural Information Processing Systems},
  year      = {1996},
  pages     = {10-16},
  url       = {https://mlanthology.org/neurips/1996/pollack1996neurips-tdgammon/}
}