SPQR: Controlling Q-Ensemble Independence with Spiked Random Model for Reinforcement Learning

Abstract

Alleviating overestimation bias is a critical challenge for deep reinforcement learning to achieve successful performance on more complex tasks or offline datasets containing out-of-distribution data. In order to overcome overestimation bias, ensemble methods for Q-learning have been investigated to exploit the diversity of multiple Q-functions. Since network initialization has been the predominant approach to promote diversity in Q-functions, heuristically designed diversity injection methods have been studied in the literature. However, previous studies have not attempted to approach guaranteed independence over an ensemble from a theoretical perspective. By introducing a novel regularization loss for Q-ensemble independence based on random matrix theory, we propose spiked Wishart Q-ensemble independence regularization (SPQR) for reinforcement learning. Specifically, we modify the intractable hypothesis testing criterion for the Q-ensemble independence into a tractable KL divergence between the spectral distribution of the Q-ensemble and the target Wigner's semicircle distribution. We implement SPQR in several online and offline ensemble Q-learning algorithms. In the experiments, SPQR outperforms the baseline algorithms in both online and offline RL benchmarks.

Cite

Text

Lee et al. "SPQR: Controlling Q-Ensemble Independence with Spiked Random Model for Reinforcement Learning." Neural Information Processing Systems, 2023.

Markdown

[Lee et al. "SPQR: Controlling Q-Ensemble Independence with Spiked Random Model for Reinforcement Learning." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/lee2023neurips-spqr/)

BibTeX

@inproceedings{lee2023neurips-spqr,
  title     = {{SPQR: Controlling Q-Ensemble Independence with Spiked Random Model for Reinforcement Learning}},
  author    = {Lee, Dohyeok and Han, Seungyub and Cho, Taehyun and Lee, Jungwoo},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/lee2023neurips-spqr/}
}