Analysis of a Classification-Based Policy Iteration Algorithm

Lazaric, Alessandro; Ghavamzadeh, Mohammad; Munos, Rémi

Analysis of a Classification-Based Policy Iteration Algorithm

Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos

ICML 2010 pp. 607-614

/icml/2010/lazaric2010icml-analysis/

Abstract

We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.

PDF Semantic Scholar

Cite

Text

Lazaric et al. "Analysis of a Classification-Based Policy Iteration Algorithm." International Conference on Machine Learning, 2010.

Markdown

[Lazaric et al. "Analysis of a Classification-Based Policy Iteration Algorithm." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/lazaric2010icml-analysis/)

BibTeX

@inproceedings{lazaric2010icml-analysis,
  title     = {{Analysis of a Classification-Based Policy Iteration Algorithm}},
  author    = {Lazaric, Alessandro and Ghavamzadeh, Mohammad and Munos, Rémi},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {607-614},
  url       = {https://mlanthology.org/icml/2010/lazaric2010icml-analysis/}
}