Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs

Zhao, Yujie; Hu, Lanxiang; Wang, Yang; Hou, Minmin; Zhang, Hao; Ding, Ke; Zhao, Jishen

Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs

Yujie Zhao, Lanxiang Hu, Yang Wang, Minmin Hou, Hao Zhang, Ke Ding, Jishen Zhao

ICLR 2026

/iclr/2026/zhao2026iclr-strongermas/

Abstract

Multi-Agent System (MAS) and Reinforcement Learning (RL) are both widely adopted to improve large language model (LLM) agentic performance. MAS strengthens task-specialized performance via role-based orchestration; RL leverages environment rewards to train stronger policies, such as Group Relative Policy Optimization (GRPO)-style optimization. Yet applying on-policy RL training to MAS is underexplored. While promising, it poses several challenges. On the algorithm side, Standard GRPO grouping assumptions fail in MAS because prompts differ by role and turn. On the system side, the training system needs to support MAS-workflow-based rollouts and on-policy updates for both single and multiple policy models. To address these issues, we introduce AT-GRPO, consisting of (i) an Agent- and Turn-wise grouped RL algorithm tailored for MAS and (ii) a system to support both single-policy and multi-policy training. Across game, plan, coding, and math tasks, AT-GRPO demonstrates substantial performance gains across diverse domains. Especially on long-horizon planning tasks, AT-GRPO boosts accuracy from a 14.0–47.0% single-agent RL baseline to 96.0–99.5%. Furthermore, it improves reasoning performance, with an average gain of 3.87–7.62% on coding and 9.0-17.93% on math. The code are available at https://github.com/pettingllms-ai/PettingLLMs.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhao et al. "Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs." International Conference on Learning Representations, 2026.

Markdown

[Zhao et al. "Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhao2026iclr-strongermas/)

BibTeX

@inproceedings{zhao2026iclr-strongermas,
  title     = {{Stronger-MAS: Multi-Agent Reinforcement Learning for Collaborative LLMs}},
  author    = {Zhao, Yujie and Hu, Lanxiang and Wang, Yang and Hou, Minmin and Zhang, Hao and Ding, Ke and Zhao, Jishen},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhao2026iclr-strongermas/}
}