A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees

Abstract

We introduce a new algorithm based on linear programming that approximates the differential value function of an average-cost Markov decision process via a linear combination of pre-selected basis functions. The algorithm carries out a form of cost shaping and minimizes a version of Bellman error. We establish an error bound that scales gracefully with the number of states without imposing the (strong) Lyapunov condition required by its counter- part in [6]. We propose a path-following method that automates selection of important algorithm parameters which represent coun- terparts to the "state-relevance weights" studied in [6].

Cite

Text

Farias and Roy. "A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees." Neural Information Processing Systems, 2004.

Markdown

[Farias and Roy. "A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees." Neural Information Processing Systems, 2004.](https://mlanthology.org/neurips/2004/farias2004neurips-costshaping/)

BibTeX

@inproceedings{farias2004neurips-costshaping,
  title     = {{A Cost-Shaping LP for Bellman Error Minimization with Performance Guarantees}},
  author    = {Farias, Daniela D. and Roy, Benjamin V.},
  booktitle = {Neural Information Processing Systems},
  year      = {2004},
  pages     = {417-424},
  url       = {https://mlanthology.org/neurips/2004/farias2004neurips-costshaping/}
}