Online Bellman Residual Algorithms with Predictive Error Guarantees
Abstract
We establish a connection between optimizing the Bellman Residual and worst case long-term predictive error. In the online learning framework, learning takes place over a sequence of trials with the goal of predicting a future discounted sum of rewards. Our analysis shows that, to- gether with a stability assumption, any no-regret online learning algorithm that minimizes Bellman error ensures small prediction error. No statistical assumptions are made on the sequence of observations, which could be non-Markovian or even adversarial. Moreover, the analysis is independent of the particular form of function approximation and the particular (stable) no-regret approach taken. Our approach thus establishes a broad new family of provably sound algorithms for Bellman Residual-based learning and provides a generalization of previous worst-case result for minimizing predictive error. We investigate the potential advantages of some of this family both theoretically and empirically on benchmark problems
Cite
Text
Sun and Bagnell. "Online Bellman Residual Algorithms with Predictive Error Guarantees." Conference on Uncertainty in Artificial Intelligence, 2015. doi:10.1184/R1/6557087.V1Markdown
[Sun and Bagnell. "Online Bellman Residual Algorithms with Predictive Error Guarantees." Conference on Uncertainty in Artificial Intelligence, 2015.](https://mlanthology.org/uai/2015/sun2015uai-online/) doi:10.1184/R1/6557087.V1BibTeX
@inproceedings{sun2015uai-online,
title = {{Online Bellman Residual Algorithms with Predictive Error Guarantees}},
author = {Sun, Wen and Bagnell, J. Andrew},
booktitle = {Conference on Uncertainty in Artificial Intelligence},
year = {2015},
pages = {852-861},
doi = {10.1184/R1/6557087.V1},
url = {https://mlanthology.org/uai/2015/sun2015uai-online/}
}