Sample-Efficient Multi-Agent RL: An Optimization Perspective

Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang

ICLR 2024

/iclr/2024/xiong2024iclr-sampleefficient/

Abstract

We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under general function approximation. In order to find the minimum assumption for sample-efficient learning, we introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. Using this measure, we propose the first unified algorithmic framework that ensures sample efficiency in learning Nash Equilibrium, Coarse Correlated Equilibrium, and Correlated Equilibrium for both model-based and model-free MARL problems with low MADC. We also show that our algorithm provides comparable sublinear regret to the existing works. Moreover, our algorithm combines an equilibrium-solving oracle with a single objective optimization subprocedure that solves for the regularized payoff of each deterministic joint policy, which avoids solving constrained optimization problems within data-dependent constraints (Jin et al. 2020; Wang et al. 2023) or executing sampling procedures with complex multi-objective optimization problems (Foster et al. 2023), thus being more amenable to empirical implementation.

PDF ICLR Semantic Scholar

Cite

Text

Xiong et al. "Sample-Efficient Multi-Agent RL: An Optimization Perspective." International Conference on Learning Representations, 2024.

Markdown

[Xiong et al. "Sample-Efficient Multi-Agent RL: An Optimization Perspective." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/xiong2024iclr-sampleefficient/)

BibTeX

@inproceedings{xiong2024iclr-sampleefficient,
  title     = {{Sample-Efficient Multi-Agent RL: An Optimization Perspective}},
  author    = {Xiong, Nuoya and Liu, Zhihan and Wang, Zhaoran and Yang, Zhuoran},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/xiong2024iclr-sampleefficient/}
}