Distributional Off-Policy Evaluation with Bellman Residual Minimization

Abstract

We study distributional off-policy evaluation (OPE), of which the goal is to learn the distribution of the return for a target policy using offline data generated by a different policy. The theoretical foundation of many existing work relies on the supremum-extended statistical distances such as supremum-Wasserstein distance, which are hard to estimate. In contrast, we study the more manageable expectation-extended statistical distances and provide a novel theoretical justification on their validity for learning the return distribution. Based on this attractive property, we propose a new method called Energy Bellman Residual Minimizer (EBRM) for distributional OPE. We provide corresponding in-depth theoretical analyses. We establish a finite-sample error bound for the EBRM estimator under the realizability assumption. Furthermore, we introduce a variant of our method based on a multi-step extension which improves the error bound for non-realizable settings. Notably, unlike prior distributional OPE methods, the theoretical guarantees of our method do not require the completeness assumption.

Cite

Text

Hong et al. "Distributional Off-Policy Evaluation with Bellman Residual Minimization." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Hong et al. "Distributional Off-Policy Evaluation with Bellman Residual Minimization." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/hong2025aistats-distributional/)

BibTeX

@inproceedings{hong2025aistats-distributional,
  title     = {{Distributional Off-Policy Evaluation with Bellman Residual Minimization}},
  author    = {Hong, Sungee and Qi, Zhengling and Wong, Raymond K. W.},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {4006-4014},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/hong2025aistats-distributional/}
}