Convergence Rates of Bayesian Network Policy Gradient for Cooperative Multi-Agent Reinforcement Learning
Abstract
Human coordination often benefits from executing actions in a correlated manner, leading to improved cooperation. This concept holds potential for enhancing cooperative multi-agent reinforcement learning (MARL). Despite this, recent advances in MARL predominantly focus on decentralized execution, which favors scalability by avoiding action correlation among agents. A recent study introduced a Bayesian network to incorporate correlations between agents' action selections within their joint policy, demonstrating global convergence to Nash equilibria under a tabular softmax policy parameterization in cooperative Markov games. In this work, we extend these theoretical results by proving the convergence rate of the Bayesian network joint policy with log-barrier regularization.
Cite
Text
Chen et al. "Convergence Rates of Bayesian Network Policy Gradient for Cooperative Multi-Agent Reinforcement Learning." NeurIPS 2024 Workshops: BDU, 2024.Markdown
[Chen et al. "Convergence Rates of Bayesian Network Policy Gradient for Cooperative Multi-Agent Reinforcement Learning." NeurIPS 2024 Workshops: BDU, 2024.](https://mlanthology.org/neuripsw/2024/chen2024neuripsw-convergence/)BibTeX
@inproceedings{chen2024neuripsw-convergence,
title = {{Convergence Rates of Bayesian Network Policy Gradient for Cooperative Multi-Agent Reinforcement Learning}},
author = {Chen, Dingyang and Zhang, Zhenyu and Kuang, Xiaolong and Shen, Xinyang and Ozer, Ozalp and Zhang, Qi},
booktitle = {NeurIPS 2024 Workshops: BDU},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/chen2024neuripsw-convergence/}
}