Multi-Agent Reinforcement Learning in Stochastic Networked Systems

Abstract

We study multi-agent reinforcement learning (MARL) in a stochastic network of agents. The objective is to find localized policies that maximize the (discounted) global reward. In general, scalability is a challenge in this setting because the size of the global state/action space can be exponential in the number of agents. Scalable algorithms are only known in cases where dependencies are static, fixed and local, e.g., between neighbors in a fixed, time-invariant underlying graph. In this work, we propose a Scalable Actor Critic framework that applies in settings where the dependencies can be non-local and stochastic, and provide a finite-time error bound that shows how the convergence rate depends on the speed of information spread in the network. Additionally, as a byproduct of our analysis, we obtain novel finite-time convergence results for a general stochastic approximation scheme and for temporal difference learning with state aggregation, which apply beyond the setting of MARL in networked systems.

Cite

Text

Lin et al. "Multi-Agent Reinforcement Learning in Stochastic Networked Systems." Neural Information Processing Systems, 2021.

Markdown

[Lin et al. "Multi-Agent Reinforcement Learning in Stochastic Networked Systems." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/lin2021neurips-multiagent/)

BibTeX

@inproceedings{lin2021neurips-multiagent,
  title     = {{Multi-Agent Reinforcement Learning in Stochastic Networked Systems}},
  author    = {Lin, Yiheng and Qu, Guannan and Huang, Longbo and Wierman, Adam},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/lin2021neurips-multiagent/}
}