The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy Versus EVO-Rummy
Abstract
Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield good results. This paper reports on a direct comparison between an agent trained to play gin rummy using temporal difference learning, and the same agent trained with co-evolution. Coevolution produced superior results. ICML Proceedings of the Twentieth International Conference on Machine Learning
Cite
Text
Kotnik and Kalita. "The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy Versus EVO-Rummy." International Conference on Machine Learning, 2003.Markdown
[Kotnik and Kalita. "The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy Versus EVO-Rummy." International Conference on Machine Learning, 2003.](https://mlanthology.org/icml/2003/kotnik2003icml-significance/)BibTeX
@inproceedings{kotnik2003icml-significance,
title = {{The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy Versus EVO-Rummy}},
author = {Kotnik, Clifford and Kalita, Jugal K.},
booktitle = {International Conference on Machine Learning},
year = {2003},
pages = {369-375},
url = {https://mlanthology.org/icml/2003/kotnik2003icml-significance/}
}