Correlated Policy Optimization in Multi-Agent Subteams

Abstract

In cooperative multi-agent reinforcement learning, agents often face scalability challenges due to the exponential growth of the joint action and observation spaces. Inspired by the structure of human teams, we explore subteam-based coordination, where agents are partitioned into fully correlated subgroups with limited inter-group interaction. We formalize this structure using Bayesian networks and propose a class of correlated joint policies induced by directed acyclic graphs . Theoretically, we prove that regularized policy gradient ascent converges to near-optimal policies under a decomposability condition of the environment. Empirically, we introduce a heuristic for dynamically constructing context-aware subteams with limited dependency budgets, and demonstrate that our method outperforms standard baselines across multiple benchmark environments.

Cite

Text

Chen et al. "Correlated Policy Optimization in Multi-Agent Subteams." International Conference on Learning Representations, 2026.

Markdown

[Chen et al. "Correlated Policy Optimization in Multi-Agent Subteams." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-correlated/)

BibTeX

@inproceedings{chen2026iclr-correlated,
  title     = {{Correlated Policy Optimization in Multi-Agent Subteams}},
  author    = {Chen, Dingyang and Ye, Jianing and Zhang, Zhenyu and Kuang, Xiaolong and Shen, Xinyang and Ozer, Ozalp and Zhang, Chongjie and Zhang, Qi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/chen2026iclr-correlated/}
}