Minimax TD-Learning with Neural Nets in a Markov Game
Abstract
A minimax version of temporal difference learning (minimax TD-learning) is given, similar to minimax Q-learning. The algorithm is used to train a neural net to play Campaign, a two-player zero-sum game with imperfect information of the Markov game class. Two different evaluation criteria for evaluating game-playing agents are used, and their relation to game theory is shown. Also practical aspects of linear programming and fictitious play used for solving matrix games are discussed.
Cite
Text
Dahl and Halck. "Minimax TD-Learning with Neural Nets in a Markov Game." European Conference on Machine Learning, 2000. doi:10.1007/3-540-45164-1_13Markdown
[Dahl and Halck. "Minimax TD-Learning with Neural Nets in a Markov Game." European Conference on Machine Learning, 2000.](https://mlanthology.org/ecmlpkdd/2000/dahl2000ecml-minimax/) doi:10.1007/3-540-45164-1_13BibTeX
@inproceedings{dahl2000ecml-minimax,
title = {{Minimax TD-Learning with Neural Nets in a Markov Game}},
author = {Dahl, Fredrik A. and Halck, Ole Martin},
booktitle = {European Conference on Machine Learning},
year = {2000},
pages = {117-128},
doi = {10.1007/3-540-45164-1_13},
url = {https://mlanthology.org/ecmlpkdd/2000/dahl2000ecml-minimax/}
}