Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

Abstract

Mean-Field Multi-Agent Reinforcement Learning (MF-MARL) is attractive in the applications involving a large population of homogeneous agents, as it exploits the permutation invariance of agents and avoids the curse of many agents. Most existing results only focus on online settings, in which agents can interact with the environment during training. In some applications such as social welfare optimization, however, the interaction during training can be prohibitive or even unethical in the societal systems. To bridge such a gap, we propose a SAFARI (peSsimistic meAn-Field vAlue iteRatIon) algorithm for off-line MF-MARL, which only requires a handful of pre-collected experience data. Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents. Numerical experiments are provided.

Cite

Text

Chen et al. "Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL." Neural Information Processing Systems, 2021.

Markdown

[Chen et al. "Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/chen2021neurips-pessimism/)

BibTeX

@inproceedings{chen2021neurips-pessimism,
  title     = {{Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL}},
  author    = {Chen, Minshuo and Li, Yan and Wang, Ethan and Yang, Zhuoran and Wang, Zhaoran and Zhao, Tuo},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/chen2021neurips-pessimism/}
}