Distributional Deep Q-Learning with CVaR Regression

Abstract

Reinforcement learning (RL) allows an agent interacting sequentially with an environment to maximize its long-term return, in expectation. In distributional RL (DRL), the agent is also interested in the probability distribution of the return, not just its expected value. This so-called distributional perspective of RL has led to new algorithms with improved empirical performance. In this paper, we recall the atomic DRL (ADRL) framework based on atomic distributions projected via the Wasserstein-2 metric. Then, we derive two new deep ADRL algorithms, namely SAD-Q-learning and MAD-Q-learning (both for the control task). Numerical experiments on various environments compare our approach against existing deep (distributional) RL methods.

Cite

Text

Achab et al. "Distributional Deep Q-Learning with CVaR Regression." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Achab et al. "Distributional Deep Q-Learning with CVaR Regression." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/achab2022neuripsw-distributional/)

BibTeX

@inproceedings{achab2022neuripsw-distributional,
  title     = {{Distributional Deep Q-Learning with CVaR Regression}},
  author    = {Achab, Mastane and Alami, Reda and Djilali, Yasser Abdelaziz Dahou and Fedyanin, Kirill and Moulines, Eric and Panov, Maxim},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/achab2022neuripsw-distributional/}
}