ResQ: A Residual Q Function-Based Approach for Multi-Agent Reinforcement Learning Value Factorization

Abstract

The factorization of state-action value functions for Multi-Agent Reinforcement Learning (MARL) is important. Existing studies are limited by their representation capability, sample efficiency, and approximation error. To address these challenges, we propose, ResQ, a MARL value function factorization method, which can find the optimal joint policy for any state-action value function through residual functions. ResQ masks some state-action value pairs from a joint state-action value function, which is transformed as the sum of a main function and a residual function. ResQ can be used with mean-value and stochastic-value RL. We theoretically show that ResQ can satisfy both the individual global max (IGM) and the distributional IGM principle without representation limitations. Through experiments on matrix games, the predator-prey, and StarCraft benchmarks, we show that ResQ can obtain better results than multiple expected/stochastic value factorization methods.

Cite

Text

Shen et al. "ResQ: A Residual Q Function-Based Approach for Multi-Agent Reinforcement Learning Value Factorization." Neural Information Processing Systems, 2022.

Markdown

[Shen et al. "ResQ: A Residual Q Function-Based Approach for Multi-Agent Reinforcement Learning Value Factorization." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/shen2022neurips-resq/)

BibTeX

@inproceedings{shen2022neurips-resq,
  title     = {{ResQ: A Residual Q Function-Based Approach for Multi-Agent Reinforcement Learning Value Factorization}},
  author    = {Shen, Siqi and Qiu, Mengwei and Liu, Jun and Liu, Weiquan and Fu, Yongquan and Liu, Xinwang and Wang, Cheng},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/shen2022neurips-resq/}
}