Robust Value Function Approximation Using Bilinear Programming
Abstract
Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, it provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP hard, but this is unavoidable because the Bellman-residual minimization itself is NP hard. We, therefore, employ and analyze a common approximate algorithm for bilinear programs. The analysis shows that this algorithm offers a convergent generalization of approximate policy iteration. Finally, we demonstrate that the proposed approach can consistently minimize the Bellman residual on a simple benchmark problem.
Cite
Text
Petrik and Zilberstein. "Robust Value Function Approximation Using Bilinear Programming." Neural Information Processing Systems, 2009.Markdown
[Petrik and Zilberstein. "Robust Value Function Approximation Using Bilinear Programming." Neural Information Processing Systems, 2009.](https://mlanthology.org/neurips/2009/petrik2009neurips-robust/)BibTeX
@inproceedings{petrik2009neurips-robust,
title = {{Robust Value Function Approximation Using Bilinear Programming}},
author = {Petrik, Marek and Zilberstein, Shlomo},
booktitle = {Neural Information Processing Systems},
year = {2009},
pages = {1446-1454},
url = {https://mlanthology.org/neurips/2009/petrik2009neurips-robust/}
}