Error Bounds for Approximate Value Iteration

Abstract

Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value rep-resentations Vn are processed iteratively by Vn+1 = AT Vn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal policy are given as a function of weighted Lp-norms (p 1) of the approximation errors. The results extend usual analysis in L1-norm, and allow to relate the performance of AVI to the approximation power (usually expressed in Lp-norm, for p = 1 or 2) of the SL algorithm. We illustrate the tightness of these bounds on an optimal replacement problem.

Cite

Text

Munos. "Error Bounds for Approximate Value Iteration." AAAI Conference on Artificial Intelligence, 2005.

Markdown

[Munos. "Error Bounds for Approximate Value Iteration." AAAI Conference on Artificial Intelligence, 2005.](https://mlanthology.org/aaai/2005/munos2005aaai-error/)

BibTeX

@inproceedings{munos2005aaai-error,
  title     = {{Error Bounds for Approximate Value Iteration}},
  author    = {Munos, Rémi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2005},
  pages     = {1006-1011},
  url       = {https://mlanthology.org/aaai/2005/munos2005aaai-error/}
}