Staged Independent Learning: Towards Decentralized Cooperative Multi-Agent Reinforcement Learning

Abstract

We empirically show that classic ideas from two-time scale stochastic approximation \citep{borkar1997stochastic} can be combined with sequential iterative best response (SIBR) to solve complex cooperative multi-agent reinforcement learning (MARL) problems. We first start with giving a multi-agent estimation problem as a motivating example where SIBR converges while parallel iterative best response (PIBR) does not. Then we present a general implementation of staged multi-agent RL algorithms based on SIBR and multi-time scale stochastic approximation, and show that our new methods which we call Staged Independent Proximal Policy Optimization (SIPPO) and Staged Independent Q-learning (SIQL) outperform state-of-the-art independent learning on almost all the tasks in the epymarl \citep{papoudakis2020benchmarking} benchmark. This can be seen as a first step towards more decentralized MARL methods based on SIBR and multi-time scale learning.

Cite

Text

Nekoei et al. "Staged Independent Learning: Towards Decentralized Cooperative Multi-Agent Reinforcement Learning." ICLR 2022 Workshops: GMS, 2022.

Markdown

[Nekoei et al. "Staged Independent Learning: Towards Decentralized Cooperative Multi-Agent Reinforcement Learning." ICLR 2022 Workshops: GMS, 2022.](https://mlanthology.org/iclrw/2022/nekoei2022iclrw-staged/)

BibTeX

@inproceedings{nekoei2022iclrw-staged,
  title     = {{Staged Independent Learning: Towards Decentralized Cooperative Multi-Agent Reinforcement Learning}},
  author    = {Nekoei, Hadi and Badrinaaraayanan, Akilesh and Sinha, Amit and Amini, Mohammad and Rajendran, Janarthanan and Mahajan, Aditya and Chandar, Sarath},
  booktitle = {ICLR 2022 Workshops: GMS},
  year      = {2022},
  url       = {https://mlanthology.org/iclrw/2022/nekoei2022iclrw-staged/}
}