Analysis of a Classification-Based Policy Iteration Algorithm
Abstract
We present a classification-based policy iteration algorithm, called Direct Policy Iteration, and provide its finite-sample analysis. Our results state a performance bound in terms of the number of policy improvement steps, the number of rollouts used in each iteration, the capacity of the considered policy space, and a new capacity measure which indicates how well the policy space can approximate policies that are greedy w.r.t. any of its members. The analysis reveals a tradeoff between the estimation and approximation errors in this classification-based policy iteration setting. We also study the consistency of the method when there exists a sequence of policy spaces with increasing capacity.
Cite
Text
Lazaric et al. "Analysis of a Classification-Based Policy Iteration Algorithm." International Conference on Machine Learning, 2010.Markdown
[Lazaric et al. "Analysis of a Classification-Based Policy Iteration Algorithm." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/lazaric2010icml-analysis/)BibTeX
@inproceedings{lazaric2010icml-analysis,
title = {{Analysis of a Classification-Based Policy Iteration Algorithm}},
author = {Lazaric, Alessandro and Ghavamzadeh, Mohammad and Munos, Rémi},
booktitle = {International Conference on Machine Learning},
year = {2010},
pages = {607-614},
url = {https://mlanthology.org/icml/2010/lazaric2010icml-analysis/}
}