Convergence of Least Squares Temporal Difference Methods Under General Conditions

Yu, Huizhen

Convergence of Least Squares Temporal Difference Methods Under General Conditions

ICML 2010 pp. 1207-1214

/icml/2010/yu2010icml-convergence/

Abstract

We consider approximate policy evaluation for finite state and action Markov decision processes (MDP) in the off-policy learning context and with the simulation-based least squares temporal difference algorithm, LSTD($\lambda$). We establish for the discounted cost criterion that the off-policy LSTD($\lambda$) converges almost surely under mild, minimal conditions. We also analyze other convergence and boundedness properties of the iterates involved in the algorithm, and based on them, we suggest a modification in its practical implementation. Our analysis uses theories of both finite space Markov chains and Markov chains on topological spaces.

PDF Semantic Scholar

Cite

Text

Yu. "Convergence of Least Squares Temporal Difference Methods Under General Conditions." International Conference on Machine Learning, 2010.

Markdown

[Yu. "Convergence of Least Squares Temporal Difference Methods Under General Conditions." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/yu2010icml-convergence/)

BibTeX

@inproceedings{yu2010icml-convergence,
  title     = {{Convergence of Least Squares Temporal Difference Methods Under General Conditions}},
  author    = {Yu, Huizhen},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {1207-1214},
  url       = {https://mlanthology.org/icml/2010/yu2010icml-convergence/}
}