Least-Squares Policy Iteration: Bias-Variance Trade-Off in Control Problems

Thiery, Christophe; Scherrer, Bruno

Least-Squares Policy Iteration: Bias-Variance Trade-Off in Control Problems

ICML 2010 pp. 1071-1078

/icml/2010/thiery2010icml-least/

Abstract

In the context of large space MDPs with linear value function approximation, we introduce a new approximate version of Λ-Policy Iteration (Berteskas and Ioffe, 1996), a method that generalizes Value Iteration and Policy Iteration. Our approach, called Least-Squares Λ Policy Iteration, generalizes LSPI (Lagoudakis & Parr, 2003) which makes efficient use of training samples compared to classical temporal-differences methods. The motivation of our work is to exploit the Λ parameter within the least-squares context, and without having to generate new sample sat each iteration or to know a model of the MDP. We provide a performance bound that shows the soundness of the algorithm. We show empirically on a simple chain problem and on the Tetris game that this Λ parameter acts as a bias-variance trade-off that may improve the convergence and the performance of the policy obtained.

PDF Semantic Scholar

Cite

Text

Thiery and Scherrer. "Least-Squares Policy Iteration: Bias-Variance Trade-Off in Control Problems." International Conference on Machine Learning, 2010.

Markdown

[Thiery and Scherrer. "Least-Squares Policy Iteration: Bias-Variance Trade-Off in Control Problems." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/thiery2010icml-least/)

BibTeX

@inproceedings{thiery2010icml-least,
  title     = {{Least-Squares Policy Iteration: Bias-Variance Trade-Off in Control Problems}},
  author    = {Thiery, Christophe and Scherrer, Bruno},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {1071-1078},
  url       = {https://mlanthology.org/icml/2010/thiery2010icml-least/}
}