Multi-Agent Learning Experiments on Repeated Matrix Games
Abstract
This paper experimentally evaluates multi-agent learning algorithms playing repeated matrix games to maximize their cumulative return. Previous works assessed that Q-learning surpassed Nash-based multi-agent learning algorithms. Based on all-against-all repeated matrix game tournaments, this paper updates the state of the art of multi-agent learning experiments. In a first stage, it shows that M-Qubed, S and bandit-based algorithms such as UCB are the best algorithms on general-sum games, Exp3 being the best on cooperative games and zero-sum games. In a second stage, our experiments show that two features - forgetting the far past, and using recent history with states - improve the learning algorithms. Finally, the best algorithms are two new algorithms, Q-learning and UCB enhanced with the two features, and M-Qubed.
Cite
Text
Bouzy and Métivier. "Multi-Agent Learning Experiments on Repeated Matrix Games." International Conference on Machine Learning, 2010.Markdown
[Bouzy and Métivier. "Multi-Agent Learning Experiments on Repeated Matrix Games." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/bouzy2010icml-multi/)BibTeX
@inproceedings{bouzy2010icml-multi,
title = {{Multi-Agent Learning Experiments on Repeated Matrix Games}},
author = {Bouzy, Bruno and Métivier, Marc},
booktitle = {International Conference on Machine Learning},
year = {2010},
pages = {119-126},
url = {https://mlanthology.org/icml/2010/bouzy2010icml-multi/}
}