Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning

Abstract

Learning from datasets without interaction with environments (Offline Learning) is an essential step to apply Reinforcement Learning (RL) algorithms in real-world scenarios.However, compared with the single-agent counterpart, offline multi-agent RL introduces more agents with the larger state and action space, which is more challenging but attracts little attention. We demonstrate current offline RL algorithms are ineffective in multi-agent systems due to the accumulated extrapolation error. In this paper, we propose a novel offline RL algorithm, named Implicit Constraint Q-learning (ICQ), which effectively alleviates the extrapolation error by only trusting the state-action pairs given in the dataset for value estimation. Moreover, we extend ICQ to multi-agent tasks by decomposing the joint-policy under the implicit constraint. Experimental results demonstrate that the extrapolation error is successfully controlled within a reasonable range and insensitive to the number of agents. We further show that ICQ achieves the state-of-the-art performance in the challenging multi-agent offline tasks (StarCraft II). Our code is public online at https://github.com/YiqinYang/ICQ.

Cite

Text

Yang et al. "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning." Neural Information Processing Systems, 2021.

Markdown

[Yang et al. "Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/yang2021neurips-believe/)

BibTeX

@inproceedings{yang2021neurips-believe,
  title     = {{Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning}},
  author    = {Yang, Yiqin and Ma, Xiaoteng and Li, Chenghao and Zheng, Zewu and Zhang, Qiyuan and Huang, Gao and Yang, Jun and Zhao, Qianchuan},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/yang2021neurips-believe/}
}