Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems

Abstract

We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a Scalable Actor Critic (SAC) framework that exploits the network structure and finds a localized policy that is an $O(\rho^\kappa)$-approximation of a stationary point of the objective for some $\rho\in(0,1)$, with complexity that scales with the local state-action space size of the largest $\kappa$-hop neighborhood of the network.

Cite

Text

Qu et al. "Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems." Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.

Markdown

[Qu et al. "Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems." Proceedings of the 2nd Conference on Learning for Dynamics and Control, 2020.](https://mlanthology.org/l4dc/2020/qu2020l4dc-scalable/)

BibTeX

@inproceedings{qu2020l4dc-scalable,
  title     = {{Scalable Reinforcement Learning of Localized Policies for Multi-Agent Networked Systems}},
  author    = {Qu, Guannan and Wierman, Adam and Li, Na},
  booktitle = {Proceedings of the 2nd Conference on Learning for Dynamics and Control},
  year      = {2020},
  pages     = {256-266},
  volume    = {120},
  url       = {https://mlanthology.org/l4dc/2020/qu2020l4dc-scalable/}
}