Double Gumbel Q-Learning
Abstract
We show that Deep Neural Networks introduce two heteroscedastic Gumbel noise sources into Q-Learning. To account for these noise sources, we propose Double Gumbel Q-Learning, a Deep Q-Learning algorithm applicable for both discrete and continuous control. In discrete control, we derive a closed-form expression for the loss function of our algorithm. In continuous control, this loss function is intractable and we therefore derive an approximation with a hyperparameter whose value regulates pessimism in Q-Learning. We present a default value for our pessimism hyperparameter that enables DoubleGum to outperform DDPG, TD3, SAC, XQL, quantile regression, and Mixture-of-Gaussian Critics in aggregate over 33 tasks from DeepMind Control, MuJoCo, MetaWorld, and Box2D and show that tuning this hyperparameter may further improve sample efficiency.
Cite
Text
Hui et al. "Double Gumbel Q-Learning." Neural Information Processing Systems, 2023.Markdown
[Hui et al. "Double Gumbel Q-Learning." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/hui2023neurips-double/)BibTeX
@inproceedings{hui2023neurips-double,
title = {{Double Gumbel Q-Learning}},
author = {Hui, David Yu-Tung and Courville, Aaron C. and Bacon, Pierre-Luc},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/hui2023neurips-double/}
}