A Cramér Distance Perspective on Quantile Regression Based Distributional Reinforcement Learning
Abstract
Distributional reinforcement learning (DRL) extends the value-based approach by approximating the full distribution over future returns instead of the mean only, providing a richer signal that leads to improved performances. Quantile Regression (QR)-based methods like QR-DQN project arbitrary distributions into a parametric subset of staircase distributions by minimizing the 1-Wasserstein distance. However, due to biases in the gradients, the quantile regression loss is used instead for training, guaranteeing the same minimizer and enjoying unbiased gradients. Non-crossing constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies. The contribution of this work is in the setting of fixed quantile levels and is twofold. First, we prove that the Cramer distance yields a projection that coincides with the 1-Wasserstein one and that, under non-crossing constraints, the squared Cramer and the quantile regression losses yield collinear gradients, shedding light on the connection between these important elements of DRL. Second, we propose a low complexity algorithm to compute the Cramer distance.
Cite
Text
Lheritier and Bondoux. "A Cramér Distance Perspective on Quantile Regression Based Distributional Reinforcement Learning." Artificial Intelligence and Statistics, 2022.Markdown
[Lheritier and Bondoux. "A Cramér Distance Perspective on Quantile Regression Based Distributional Reinforcement Learning." Artificial Intelligence and Statistics, 2022.](https://mlanthology.org/aistats/2022/lheritier2022aistats-cramer/)BibTeX
@inproceedings{lheritier2022aistats-cramer,
title = {{A Cramér Distance Perspective on Quantile Regression Based Distributional Reinforcement Learning}},
author = {Lheritier, Alix and Bondoux, Nicolas},
booktitle = {Artificial Intelligence and Statistics},
year = {2022},
pages = {5774-5789},
volume = {151},
url = {https://mlanthology.org/aistats/2022/lheritier2022aistats-cramer/}
}