Minimax TD-Learning with Neural Nets in a Markov Game

Abstract

A minimax version of temporal difference learning (minimax TD-learning) is given, similar to minimax Q-learning. The algorithm is used to train a neural net to play Campaign, a two-player zero-sum game with imperfect information of the Markov game class. Two different evaluation criteria for evaluating game-playing agents are used, and their relation to game theory is shown. Also practical aspects of linear programming and fictitious play used for solving matrix games are discussed.

Cite

Text

Dahl and Halck. "Minimax TD-Learning with Neural Nets in a Markov Game." European Conference on Machine Learning, 2000. doi:10.1007/3-540-45164-1_13

Markdown

[Dahl and Halck. "Minimax TD-Learning with Neural Nets in a Markov Game." European Conference on Machine Learning, 2000.](https://mlanthology.org/ecmlpkdd/2000/dahl2000ecml-minimax/) doi:10.1007/3-540-45164-1_13

BibTeX

@inproceedings{dahl2000ecml-minimax,
  title     = {{Minimax TD-Learning with Neural Nets in a Markov Game}},
  author    = {Dahl, Fredrik A. and Halck, Ole Martin},
  booktitle = {European Conference on Machine Learning},
  year      = {2000},
  pages     = {117-128},
  doi       = {10.1007/3-540-45164-1_13},
  url       = {https://mlanthology.org/ecmlpkdd/2000/dahl2000ecml-minimax/}
}