Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation
Abstract
Distributional reinforcement learning improves performance by capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In addition, the intractable element of the infinite dimensionality of distributions has been overlooked. In this paper, we present a regret analysis of distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting. We first introduce a key notion of Bellman unbiasedness which is essential for exactly learnable and provably efficient distributional updates in an online manner. Among all types of statistical functionals for representing infinite-dimensional return distributions, our theoretical results demonstrate that only moment functionals can exactly capture the statistical information. Secondly, we propose a provably efficient algorithm, SF-LSVI, that achieves a tight regret bound of $\tilde{O}(d_E H^{\frac{3}{2}}\sqrt{K})$ where $H$ is the horizon, $K$ is the number of episodes, and $d_E$ is the eluder dimension of a function class.
Cite
Text
Cho et al. "Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Cho et al. "Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/cho2025icml-bellman/)BibTeX
@inproceedings{cho2025icml-bellman,
title = {{Bellman Unbiasedness: Toward Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation}},
author = {Cho, Taehyun and Han, Seungyub and Ju, Seokhun and Kim, Dohyeong and Lee, Kyungjae and Lee, Jungwoo},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {10499-10523},
volume = {267},
url = {https://mlanthology.org/icml/2025/cho2025icml-bellman/}
}