Epistemic Bellman Operators

Abstract

Uncertainty quantification remains a difficult challenge in reinforcement learning. Several algorithms exist that successfully quantify uncertainty in a practical setting. However it is unclear whether these algorithms are theoretically sound and can be expected to converge. Furthermore, they seem to treat the uncertainty in the target parameters in different ways. In this work, we unify several practical algorithms into one theoretical framework by defining a new Bellman operator on distributions, and show that this Bellman operator is a contraction. We highlight use cases of our framework by analyzing an existing Bayesian Q-learning algorithm, and also introduce a novel uncertainty-aware variant of PPO that adaptively sets its clipping hyperparameter.

Cite

Text

van der Vaart et al. "Epistemic Bellman Operators." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I20.35393

Markdown

[van der Vaart et al. "Epistemic Bellman Operators." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/vandervaart2025aaai-epistemic/) doi:10.1609/AAAI.V39I20.35393

BibTeX

@inproceedings{vandervaart2025aaai-epistemic,
  title     = {{Epistemic Bellman Operators}},
  author    = {van der Vaart, Pascal R. and Spaan, Matthijs T. J. and Yorke-Smith, Neil},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {20973-20981},
  doi       = {10.1609/AAAI.V39I20.35393},
  url       = {https://mlanthology.org/aaai/2025/vandervaart2025aaai-epistemic/}
}