Temporal Difference Learning Applied to a High-Performance Game-Playing Program
Abstract
The temporal difference (TD) learning algorithm offers the hope that the arduous task of manually tuning the evaluation function weights of game-playing programs can be automated. With one exception (TD-Gammon), TD learning has not been demonstrated to be effictive in a high-performance, world Class game-palying program. Further, there has been doubt expressed by game-program developers that learned weights could compete with the best hand-tuned weights. Chinook is the World Man-Machine tuned over 5 years. This paper shows that TD learinng is capable of competing with the best human effort.
Cite
Text
Schaeffer et al. "Temporal Difference Learning Applied to a High-Performance Game-Playing Program." International Joint Conference on Artificial Intelligence, 2001.Markdown
[Schaeffer et al. "Temporal Difference Learning Applied to a High-Performance Game-Playing Program." International Joint Conference on Artificial Intelligence, 2001.](https://mlanthology.org/ijcai/2001/schaeffer2001ijcai-temporal/)BibTeX
@inproceedings{schaeffer2001ijcai-temporal,
title = {{Temporal Difference Learning Applied to a High-Performance Game-Playing Program}},
author = {Schaeffer, Jonathan and Hlynka, Markian and Jussila, Vili},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2001},
pages = {529-534},
url = {https://mlanthology.org/ijcai/2001/schaeffer2001ijcai-temporal/}
}