Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics

Abstract

We study the problem of distributional reinforcement learning using categorical parametrisations and a KL divergence loss. Previous work analyzing categorical distributional RL has done so using a Cramér distance-based loss, simplifying the analysis but creating a theory-practice gap. We introduce a preconditioned version of the algorithm, and prove that it is guaranteed to converge. We further derive the asymptotic variance of the categorical estimates under different learning rate regimes, and compare to that of classical reinforcement learning. We finally empirically validate our theoretical results and perform an empirical investigation into the relative strengths of using KL losses, and derive a number of actionable insights for practitioners.

Cite

Text

Kastner et al. "Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Kastner et al. "Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/kastner2025icml-categorical/)

BibTeX

@inproceedings{kastner2025icml-categorical,
  title     = {{Categorical Distributional Reinforcement Learning with Kullback-Leibler Divergence: Convergence and Asymptotics}},
  author    = {Kastner, Tyler and Rowland, Mark and Tang, Yunhao and Erdogdu, Murat A and Farahmand, Amir-Massoud},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {29294-29320},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/kastner2025icml-categorical/}
}