Convergence and No-Regret in Multiagent Learning
Abstract
Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the envi- ronment is no longer stationary, thus undermining convergence guaran- tees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner's particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most com- mon evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normal- form games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empir- ical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR).
Cite
Text
Bowling. "Convergence and No-Regret in Multiagent Learning." Neural Information Processing Systems, 2004.Markdown
[Bowling. "Convergence and No-Regret in Multiagent Learning." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/bowling2004neurips-convergence/)BibTeX
@inproceedings{bowling2004neurips-convergence,
title = {{Convergence and No-Regret in Multiagent Learning}},
author = {Bowling, Michael},
booktitle = {Neural Information Processing Systems},
year = {2004},
pages = {209-216},
url = {https://mlanthology.org/neurips/2004/bowling2004neurips-convergence/}
}