Error Propagation for Approximate Policy and Value Iteration
Abstract
We address the question of how the approximation error/Bellman residual at each iteration of the Approximate Policy/Value Iteration algorithms influences the quality of the resulted policy. We quantify the performance loss as the Lp norm of the approximation error/Bellman residual at each iteration. Moreover, we show that the performance loss depends on the expectation of the squared Radon-Nikodym derivative of a certain distribution rather than its supremum -- as opposed to what has been suggested by the previous results. Also our results indicate that the contribution of the approximation/Bellman error to the performance loss is more prominent in the later iterations of API/AVI, and the effect of an error term in the earlier iterations decays exponentially fast.
Cite
Text
Farahmand et al. "Error Propagation for Approximate Policy and Value Iteration." Neural Information Processing Systems, 2010.Markdown
[Farahmand et al. "Error Propagation for Approximate Policy and Value Iteration." Neural Information Processing Systems, 2010.](https://mlanthology.org/neurips/2010/farahmand2010neurips-error/)BibTeX
@inproceedings{farahmand2010neurips-error,
title = {{Error Propagation for Approximate Policy and Value Iteration}},
author = {Farahmand, Amir-massoud and Szepesvári, Csaba and Munos, Rémi},
booktitle = {Neural Information Processing Systems},
year = {2010},
pages = {568-576},
url = {https://mlanthology.org/neurips/2010/farahmand2010neurips-error/}
}