Multi-Agent Learning Experiments on Repeated Matrix Games

Abstract

This paper experimentally evaluates multi-agent learning algorithms playing repeated matrix games to maximize their cumulative return. Previous works assessed that Q-learning surpassed Nash-based multi-agent learning algorithms. Based on all-against-all repeated matrix game tournaments, this paper updates the state of the art of multi-agent learning experiments. In a first stage, it shows that M-Qubed, S and bandit-based algorithms such as UCB are the best algorithms on general-sum games, Exp3 being the best on cooperative games and zero-sum games. In a second stage, our experiments show that two features - forgetting the far past, and using recent history with states - improve the learning algorithms. Finally, the best algorithms are two new algorithms, Q-learning and UCB enhanced with the two features, and M-Qubed.

Cite

Text

Bouzy and Métivier. "Multi-Agent Learning Experiments on Repeated Matrix Games." International Conference on Machine Learning, 2010.

Markdown

[Bouzy and Métivier. "Multi-Agent Learning Experiments on Repeated Matrix Games." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/bouzy2010icml-multi/)

BibTeX

@inproceedings{bouzy2010icml-multi,
  title     = {{Multi-Agent Learning Experiments on Repeated Matrix Games}},
  author    = {Bouzy, Bruno and Métivier, Marc},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {119-126},
  url       = {https://mlanthology.org/icml/2010/bouzy2010icml-multi/}
}