A Fast and Reliable Policy Improvement Algorithm

Abstract

We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.

Cite

Text

Abbasi-Yadkori et al. "A Fast and Reliable Policy Improvement Algorithm." International Conference on Artificial Intelligence and Statistics, 2016.

Markdown

[Abbasi-Yadkori et al. "A Fast and Reliable Policy Improvement Algorithm." International Conference on Artificial Intelligence and Statistics, 2016.](https://mlanthology.org/aistats/2016/abbasiyadkori2016aistats-fast/)

BibTeX

@inproceedings{abbasiyadkori2016aistats-fast,
  title     = {{A Fast and Reliable Policy Improvement Algorithm}},
  author    = {Abbasi-Yadkori, Yasin and Bartlett, Peter L. and Wright, Stephen J.},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2016},
  pages     = {1338-1346},
  url       = {https://mlanthology.org/aistats/2016/abbasiyadkori2016aistats-fast/}
}