Correlated Policy Optimization in Multi-Agent Subteams
Abstract
In cooperative multi-agent reinforcement learning, agents often face scalability challenges due to the exponential growth of the joint action and observation spaces. Inspired by the structure of human teams, we explore subteam-based coordination, where agents are partitioned into fully correlated subgroups with limited inter-group interaction. We formalize this structure using Bayesian networks and propose a class of correlated joint policies induced by directed acyclic graphs . Theoretically, we prove that regularized policy gradient ascent converges to near-optimal policies under a decomposability condition of the environment. Empirically, we introduce a heuristic for dynamically constructing context-aware subteams with limited dependency budgets, and demonstrate that our method outperforms standard baselines across multiple benchmark environments.
Cite
Text
Chen et al. "Correlated Policy Optimization in Multi-Agent Subteams." International Conference on Learning Representations, 2026.Markdown
[Chen et al. "Correlated Policy Optimization in Multi-Agent Subteams." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-correlated/)BibTeX
@inproceedings{chen2026iclr-correlated,
title = {{Correlated Policy Optimization in Multi-Agent Subteams}},
author = {Chen, Dingyang and Ye, Jianing and Zhang, Zhenyu and Kuang, Xiaolong and Shen, Xinyang and Ozer, Ozalp and Zhang, Chongjie and Zhang, Qi},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/chen2026iclr-correlated/}
}