Error Bounds for Approximate Value Iteration
Abstract
Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value rep-resentations Vn are processed iteratively by Vn+1 = AT Vn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal policy are given as a function of weighted Lp-norms (p 1) of the approximation errors. The results extend usual analysis in L1-norm, and allow to relate the performance of AVI to the approximation power (usually expressed in Lp-norm, for p = 1 or 2) of the SL algorithm. We illustrate the tightness of these bounds on an optimal replacement problem.
Cite
Text
Munos. "Error Bounds for Approximate Value Iteration." AAAI Conference on Artificial Intelligence, 2005.Markdown
[Munos. "Error Bounds for Approximate Value Iteration." AAAI Conference on Artificial Intelligence, 2005.](https://mlanthology.org/aaai/2005/munos2005aaai-error/)BibTeX
@inproceedings{munos2005aaai-error,
title = {{Error Bounds for Approximate Value Iteration}},
author = {Munos, Rémi},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2005},
pages = {1006-1011},
url = {https://mlanthology.org/aaai/2005/munos2005aaai-error/}
}