Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model

Abstract

We propose a new algorithm for model-based distributional reinforcement learning (RL), and prove that it is minimax-optimal for approximating return distributions in the generative model regime (up to logarithmic factors), the first result of this kind for any distributional RL algorithm. Our analysis also provides new theoretical perspectives on categorical approaches to distributional RL, as well as introducing a new distributional Bellman equation, the stochastic categorical CDF Bellman equation, which we expect to be of independent interest. Finally, we provide an experimental study comparing a variety of model-based distributional RL algorithms, with several key takeaways for practitioners.

Cite

Text

Rowland et al. "Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model." Neural Information Processing Systems, 2024. doi:10.52202/079017-4221

Markdown

[Rowland et al. "Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/rowland2024neurips-nearminimaxoptimal/) doi:10.52202/079017-4221

BibTeX

@inproceedings{rowland2024neurips-nearminimaxoptimal,
  title     = {{Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model}},
  author    = {Rowland, Mark and Wenliang, Li Kevin and Munos, Rémi and Lyle, Clare and Tang, Yunhao and Dabney, Will},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4221},
  url       = {https://mlanthology.org/neurips/2024/rowland2024neurips-nearminimaxoptimal/}
}