Preconditioned Temporal Difference Learning

Abstract

This paper extends many of the recent popular reinforcement learning (RL) algorithms to a generalized framework that includes least-squares temporal difference (LSTD) learning, least-squares policy evaluation (LSPE) and a variant of incremental LSTD (iLSTD). The basis of this extension is a preconditioning technique that tries to solve a stochastic model equation. This paper also studies three signicant issues of the new framework: it presents a new rule of step-size that can be computed online, provides an iterative way to apply preconditioning, and reduces the complexity of related algorithms to near that of temporal difference (TD) learning.

Cite

Text

Yao and Liu. "Preconditioned Temporal Difference Learning." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390308

Markdown

[Yao and Liu. "Preconditioned Temporal Difference Learning." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/yao2008icml-preconditioned/) doi:10.1145/1390156.1390308

BibTeX

@inproceedings{yao2008icml-preconditioned,
  title     = {{Preconditioned Temporal Difference Learning}},
  author    = {Yao, Hengshuai and Liu, Zhi-Qiang},
  booktitle = {International Conference on Machine Learning},
  year      = {2008},
  pages     = {1208-1215},
  doi       = {10.1145/1390156.1390308},
  url       = {https://mlanthology.org/icml/2008/yao2008icml-preconditioned/}
}