Staged Independent Learning: Towards Decentralized Cooperative Multi-Agent Reinforcement Learning
Abstract
We empirically show that classic ideas from two-time scale stochastic approximation \citep{borkar1997stochastic} can be combined with sequential iterative best response (SIBR) to solve complex cooperative multi-agent reinforcement learning (MARL) problems. We first start with giving a multi-agent estimation problem as a motivating example where SIBR converges while parallel iterative best response (PIBR) does not. Then we present a general implementation of staged multi-agent RL algorithms based on SIBR and multi-time scale stochastic approximation, and show that our new methods which we call Staged Independent Proximal Policy Optimization (SIPPO) and Staged Independent Q-learning (SIQL) outperform state-of-the-art independent learning on almost all the tasks in the epymarl \citep{papoudakis2020benchmarking} benchmark. This can be seen as a first step towards more decentralized MARL methods based on SIBR and multi-time scale learning.
Cite
Text
Nekoei et al. "Staged Independent Learning: Towards Decentralized Cooperative Multi-Agent Reinforcement Learning." ICLR 2022 Workshops: GMS, 2022.Markdown
[Nekoei et al. "Staged Independent Learning: Towards Decentralized Cooperative Multi-Agent Reinforcement Learning." ICLR 2022 Workshops: GMS, 2022.](https://mlanthology.org/iclrw/2022/nekoei2022iclrw-staged/)BibTeX
@inproceedings{nekoei2022iclrw-staged,
title = {{Staged Independent Learning: Towards Decentralized Cooperative Multi-Agent Reinforcement Learning}},
author = {Nekoei, Hadi and Badrinaaraayanan, Akilesh and Sinha, Amit and Amini, Mohammad and Rajendran, Janarthanan and Mahajan, Aditya and Chandar, Sarath},
booktitle = {ICLR 2022 Workshops: GMS},
year = {2022},
url = {https://mlanthology.org/iclrw/2022/nekoei2022iclrw-staged/}
}