On the Rate of Convergence and Error Bounds for LSTD(λ)
Abstract
We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.
Cite
Text
Tagorti and Scherrer. "On the Rate of Convergence and Error Bounds for LSTD(λ)." International Conference on Machine Learning, 2015.Markdown
[Tagorti and Scherrer. "On the Rate of Convergence and Error Bounds for LSTD(λ)." International Conference on Machine Learning, 2015.](https://mlanthology.org/icml/2015/tagorti2015icml-rate/)BibTeX
@inproceedings{tagorti2015icml-rate,
title = {{On the Rate of Convergence and Error Bounds for LSTD(λ)}},
author = {Tagorti, Manel and Scherrer, Bruno},
booktitle = {International Conference on Machine Learning},
year = {2015},
pages = {1521-1529},
volume = {37},
url = {https://mlanthology.org/icml/2015/tagorti2015icml-rate/}
}