Convergence and No-Regret in Multiagent Learning

Abstract

Learning in a multiagent system is a challenging problem due to two key factors. First, if other agents are simultaneously learning then the envi- ronment is no longer stationary, thus undermining convergence guaran- tees. Second, learning is often susceptible to deception, where the other agents may be able to exploit a learner's particular dynamics. In the worst case, this could result in poorer performance than if the agent was not learning at all. These challenges are identifiable in the two most com- mon evaluation criteria for multiagent learning algorithms: convergence and regret. Algorithms focusing on convergence or regret in isolation are numerous. In this paper, we seek to address both criteria in a single algorithm by introducing GIGA-WoLF, a learning algorithm for normal- form games. We prove the algorithm guarantees at most zero average regret, while demonstrating the algorithm converges in many situations of self-play. We prove convergence in a limited setting and give empir- ical results in a wider variety of situations. These results also suggest a third new learning criterion combining convergence and regret, which we call negative non-convergence regret (NNR).

Cite

Text

Bowling. "Convergence and No-Regret in Multiagent Learning." Neural Information Processing Systems, 2004.

Markdown

[Bowling. "Convergence and No-Regret in Multiagent Learning." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/bowling2004neurips-convergence/)

BibTeX

@inproceedings{bowling2004neurips-convergence,
  title     = {{Convergence and No-Regret in Multiagent Learning}},
  author    = {Bowling, Michael},
  booktitle = {Neural Information Processing Systems},
  year      = {2004},
  pages     = {209-216},
  url       = {https://mlanthology.org/neurips/2004/bowling2004neurips-convergence/}
}