Temporal Difference Learning Applied to a High-Performance Game-Playing Program

Abstract

The temporal difference (TD) learning algorithm offers the hope that the arduous task of manually tuning the evaluation function weights of game-playing programs can be automated. With one exception (TD-Gammon), TD learning has not been demonstrated to be effictive in a high-performance, world Class game-palying program. Further, there has been doubt expressed by game-program developers that learned weights could compete with the best hand-tuned weights. Chinook is the World Man-Machine tuned over 5 years. This paper shows that TD learinng is capable of competing with the best human effort.

Cite

Text

Schaeffer et al. "Temporal Difference Learning Applied to a High-Performance Game-Playing Program." International Joint Conference on Artificial Intelligence, 2001.

Markdown

[Schaeffer et al. "Temporal Difference Learning Applied to a High-Performance Game-Playing Program." International Joint Conference on Artificial Intelligence, 2001.](https://mlanthology.org/ijcai/2001/schaeffer2001ijcai-temporal/)

BibTeX

@inproceedings{schaeffer2001ijcai-temporal,
  title     = {{Temporal Difference Learning Applied to a High-Performance Game-Playing Program}},
  author    = {Schaeffer, Jonathan and Hlynka, Markian and Jussila, Vili},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2001},
  pages     = {529-534},
  url       = {https://mlanthology.org/ijcai/2001/schaeffer2001ijcai-temporal/}
}