Modelling the Dynamics of Multiagent Q-Learning in Repeated Symmetric Games: A Mean Field Theoretic Approach

Abstract

Modelling the dynamics of multi-agent learning has long been an important research topic, but all of the previous works focus on 2-agent settings and mostly use evolutionary game theoretic approaches. In this paper, we study an n-agent setting with n tends to infinity, such that agents learn their policies concurrently over repeated symmetric bimatrix games with some other agents. Using mean field theory, we approximate the effects of other agents on a single agent by an averaged effect. A Fokker-Planck equation that describes the evolution of the probability distribution of Q-values in the agent population is derived. To the best of our knowledge, this is the first time to show the Q-learning dynamics under an n-agent setting can be described by a system of only three equations. We validate our model through comparisons with agent-based simulations on typical symmetric bimatrix games and different initial settings of Q-values.

Cite

Text

Hu et al. "Modelling the Dynamics of Multiagent Q-Learning in Repeated Symmetric Games: A Mean Field Theoretic Approach." Neural Information Processing Systems, 2019.

Markdown

[Hu et al. "Modelling the Dynamics of Multiagent Q-Learning in Repeated Symmetric Games: A Mean Field Theoretic Approach." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/hu2019neurips-modelling/)

BibTeX

@inproceedings{hu2019neurips-modelling,
  title     = {{Modelling the Dynamics of Multiagent Q-Learning in Repeated Symmetric Games: A Mean Field Theoretic Approach}},
  author    = {Hu, Shuyue and Leung, Chin-wing and Leung, Ho-fung},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {12125-12135},
  url       = {https://mlanthology.org/neurips/2019/hu2019neurips-modelling/}
}