MAVEN: Multi-Agent Variational Exploration

Abstract

Centralised training with decentralised execution is an important setting for cooperative deep multi-agent reinforcement learning due to communication constraints during execution and computational tractability in training. In this paper, we analyse value-based methods that are known to have superior performance in complex environments. We specifically focus on QMIX, the current state-of-the-art in this domain. We show that the representation constraints on the joint action-values introduced by QMIX and similar methods lead to provably poor exploration and suboptimality. Furthermore, we propose a novel approach called MAVEN that hybridises value and policy-based methods by introducing a latent space for hierarchical control. The value-based agents condition their behaviour on the shared latent variable controlled by a hierarchical policy. This allows MAVEN to achieve committed, temporally extended exploration, which is key to solving complex multi-agent tasks. Our experimental results show that MAVEN achieves significant performance improvements on the challenging SMAC domain.

Cite

Text

Mahajan et al. "MAVEN: Multi-Agent Variational Exploration." Neural Information Processing Systems, 2019.

Markdown

[Mahajan et al. "MAVEN: Multi-Agent Variational Exploration." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/mahajan2019neurips-maven/)

BibTeX

@inproceedings{mahajan2019neurips-maven,
  title     = {{MAVEN: Multi-Agent Variational Exploration}},
  author    = {Mahajan, Anuj and Rashid, Tabish and Samvelyan, Mikayel and Whiteson, Shimon},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {7613-7624},
  url       = {https://mlanthology.org/neurips/2019/mahajan2019neurips-maven/}
}