Classes of Multiagent Q-Learning Dynamics with Epsilon-Greedy Exploration

Wunder, Michael; Littman, Michael L.; Babes, Monica

Classes of Multiagent Q-Learning Dynamics with Epsilon-Greedy Exploration

Michael Wunder, Michael L. Littman, Monica Babes

ICML 2010 pp. 1167-1174

/icml/2010/wunder2010icml-classes/

Abstract

Q-learning in single-agent environments is known to converge in the limit given sufficient exploration. The same algorithm has been applied, with some success, in multiagent environments, where traditional analysis techniques break down. Using established dynamical systems methods, we derive and study an idealization of Q-learning in 2-player 2-action repeated general-sum games. In particular, we address the discontinuous case of epsilon-greedy exploration and use it as a proxy for value-based algorithms to highlight a contrast with existing results in policy search. Analogously to previous results for gradient ascent algorithms, we provide a complete catalog of the convergence behavior of the epsilon-greedy Q-learning algorithm by introducing new subclasses of these games. We identify two subclasses of Prisoner's Dilemma-like games where the application of Q-learning with epsilon-greedy exploration results in higher-than-Nash payoffs for some initial conditions.

PDF Semantic Scholar

Cite

Text

Wunder et al. "Classes of Multiagent Q-Learning Dynamics with Epsilon-Greedy Exploration." International Conference on Machine Learning, 2010.

Markdown

[Wunder et al. "Classes of Multiagent Q-Learning Dynamics with Epsilon-Greedy Exploration." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/wunder2010icml-classes/)

BibTeX

@inproceedings{wunder2010icml-classes,
  title     = {{Classes of Multiagent Q-Learning Dynamics with Epsilon-Greedy Exploration}},
  author    = {Wunder, Michael and Littman, Michael L. and Babes, Monica},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {1167-1174},
  url       = {https://mlanthology.org/icml/2010/wunder2010icml-classes/}
}