Why Did TD-Gammon Work?
Abstract
Although TD-Gammon is one of the major successes in machine learn(cid:173) ing, it has not led to similar impressive breakthroughs in temporal dif(cid:173) ference learning for other applications or even other games. We were able to replicate some of the success of TD-Gammon, developing a competitive evaluation function on a 4000 parameter feed-forward neu(cid:173) ral network, without using back-propagation, reinforcement or temporal difference learning methods. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest that the surprising success of Tesauro's program had more to do with the co-evolutionary structure of the learning task and the dynamics of the backgammon game itself.
Cite
Text
Pollack and Blair. "Why Did TD-Gammon Work?." Neural Information Processing Systems, 1996.Markdown
[Pollack and Blair. "Why Did TD-Gammon Work?." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/pollack1996neurips-tdgammon/)BibTeX
@inproceedings{pollack1996neurips-tdgammon,
title = {{Why Did TD-Gammon Work?}},
author = {Pollack, Jordan B. and Blair, Alan D.},
booktitle = {Neural Information Processing Systems},
year = {1996},
pages = {10-16},
url = {https://mlanthology.org/neurips/1996/pollack1996neurips-tdgammon/}
}