The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy Versus EVO-Rummy

Abstract

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield good results. This paper reports on a direct comparison between an agent trained to play gin rummy using temporal difference learning, and the same agent trained with co-evolution. Coevolution produced superior results. ICML Proceedings of the Twentieth International Conference on Machine Learning

Cite

Text

Kotnik and Kalita. "The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy Versus EVO-Rummy." International Conference on Machine Learning, 2003.

Markdown

[Kotnik and Kalita. "The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy Versus EVO-Rummy." International Conference on Machine Learning, 2003.](https://mlanthology.org/icml/2003/kotnik2003icml-significance/)

BibTeX

@inproceedings{kotnik2003icml-significance,
  title     = {{The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy Versus EVO-Rummy}},
  author    = {Kotnik, Clifford and Kalita, Jugal K.},
  booktitle = {International Conference on Machine Learning},
  year      = {2003},
  pages     = {369-375},
  url       = {https://mlanthology.org/icml/2003/kotnik2003icml-significance/}
}