An Analysis of Distributional Reinforcement Learning with Gaussian Mixtures

Abstract

Distributional Reinforcement Learning (DRL) aims at optimizing a risk measure of the return by representing its distribution. However, finding a representation of this distribution is challenging as it requires a tractable estimation of the risk measure, a tractable loss, and a representation with enough approximation power. Although Gaussian mixtures (GM) are powerful statistical models to solve these challenges, only very few papers have investigated this approach and most use the L$_2$ space norm as a tractable metric between GM. In this paper, we provide new theoretical results on previously unstudied metrics. We show that the L$_2$ metric is not suitable and propose alternative metrics, a mixture-specific optimal transport (MW) distance and a maximum mean discrepancy distance. Focusing on temporal difference (TD) learning, we prove a convergence result for a related dynamic programming algorithm for the MW metric. Leveraging natural multivariate GM representations, we also highlight the potential of MW in multi-objective RL. Our approach is illustrated on some environments of the Atari Learning Environment benchmark and shows promising empirical results.

Cite

Text

Antonetti et al. "An Analysis of Distributional Reinforcement Learning with Gaussian Mixtures." Transactions on Machine Learning Research, 2026.

Markdown

[Antonetti et al. "An Analysis of Distributional Reinforcement Learning with Gaussian Mixtures." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/antonetti2026tmlr-analysis/)

BibTeX

@article{antonetti2026tmlr-analysis,
  title     = {{An Analysis of Distributional Reinforcement Learning with Gaussian Mixtures}},
  author    = {Antonetti, Mathis and Donancio, Henrique and Forbes, Florence},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/antonetti2026tmlr-analysis/}
}