Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation

Hsu, Hao-Lun; Pajic, Miroslav

Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation

L4DC 2025 pp. 1353-1364

/l4dc/2025/hsu2025l4dc-safe/

Abstract

Cooperative multi-agent reinforcement learning (MARL) has shown significant promise in dynamic control environments, where effective communication and tailored exploration strategies facilitate collaboration. However, ensuring safe exploration remains challenging, as even a single unsafe action from any agent can lead to severe consequences. To mitigate this risk, we introduce Scoop-LSVI, a UCB-based cooperative parallel RL framework that achieves low cumulative regret with minimal communication demands while adhering to safety constraints. This framework enables multiple agents to concurrently solve isolated Markov Decision Processes (MDPs) and share information to enhance learning efficiency. Scoop-LSVI attains a regret of $\Tilde{O}(\kappa d^{3/2} H^2 \sqrt{MK})$, where $d$ is the feature dimension, $H$ is the horizon length, $M$ is the number of agents, $K$ is the number of episodes for each agent, and $\kappa$ represents safety constraints. This result aligns with state-of-the-art findings for unsafe cooperative MARL and also matches the regret bounds of UCB-based single-agent RL algorithms ($M = 1$), highlighting the potential of Scoop-LSVI to support safe and efficient learning in cooperative MARL applications.

PDF L4DC Semantic Scholar

Cite

Text

Hsu and Pajic. "Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation." Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, 2025.

Markdown

[Hsu and Pajic. "Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation." Proceedings of the 7th Annual Learning for Dynamics \& Control Conference, 2025.](https://mlanthology.org/l4dc/2025/hsu2025l4dc-safe/)

BibTeX

@inproceedings{hsu2025l4dc-safe,
  title     = {{Safe Cooperative Multi-Agent Reinforcement Learning with Function Approximation}},
  author    = {Hsu, Hao-Lun and Pajic, Miroslav},
  booktitle = {Proceedings of the 7th Annual Learning for Dynamics \& Control Conference},
  year      = {2025},
  pages     = {1353-1364},
  volume    = {283},
  url       = {https://mlanthology.org/l4dc/2025/hsu2025l4dc-safe/}
}