Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning

Abstract

This paper studies a policy optimization problem arising from collaborative multi-agent reinforcement learning in a decentralized setting where agents communicate with their neighbors over an undirected graph to maximize the sum of their cumulative rewards. A novel decentralized natural policy gradient method, dubbed Momentum-based Decentralized Natural Policy Gradient (MDNPG), is proposed, which incorporates natural gradient, momentum-based variance reduction, and gradient tracking into the decentralized stochastic gradient ascent framework. The $\mathcal{O}(n^{-1}\epsilon^{-3})$ sample complexity for MDNPG to converge to an $\epsilon$-stationary point has been established under standard assumptions, where $n$ is the number of agents. It indicates that MDNPG can achieve the optimal convergence rate for decentralized policy gradient methods and possesses a linear speedup in contrast to centralized optimization methods. Moreover, superior empirical performance of MDNPG over other state-of-the-art algorithms has been demonstrated by extensive numerical experiments.

Cite

Text

Chen et al. "Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning." Journal of Machine Learning Research, 2024.

Markdown

[Chen et al. "Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/chen2024jmlr-decentralized/)

BibTeX

@article{chen2024jmlr-decentralized,
  title     = {{Decentralized Natural Policy Gradient with Variance Reduction for Collaborative Multi-Agent Reinforcement Learning}},
  author    = {Chen, Jinchi and Feng, Jie and Gao, Weiguo and Wei, Ke},
  journal   = {Journal of Machine Learning Research},
  year      = {2024},
  pages     = {1-49},
  volume    = {25},
  url       = {https://mlanthology.org/jmlr/2024/chen2024jmlr-decentralized/}
}