A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation

Jalaj Bhandari, Daniel Russo, Raghav Singal

COLT 2018 pp. 1691-1692

doi:10.1287/OPRE.2020.2024 /colt/2018/bhandari2018colt-finite/

Abstract

Temporal difference learning (TD) is a simple iterative algorithm widely used for policy evaluation in Markov reward processes. Bhandari et al. prove finite time convergence rates for TD learning with linear function approximation. The analysis follows using a key insight that establishes rigorous connections between TD updates and those of online gradient descent. In a model where observations are corrupted by i.i.d. noise, convergence results for TD follow by essentially mirroring the analysis for online gradient descent. Using an information-theoretic technique, the authors also provide results for the case when TD is applied to a single Markovian data stream where the algorithm’s updates can be severely biased. Their analysis seamlessly extends to the study of TD learning with eligibility traces and Q-learning for high-dimensional optimal stopping problems.

PDF COLT Semantic Scholar

Cite

Text

Bhandari et al. "A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation." Annual Conference on Computational Learning Theory, 2018. doi:10.1287/OPRE.2020.2024

Markdown

[Bhandari et al. "A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation." Annual Conference on Computational Learning Theory, 2018.](https://mlanthology.org/colt/2018/bhandari2018colt-finite/) doi:10.1287/OPRE.2020.2024

BibTeX

@inproceedings{bhandari2018colt-finite,
  title     = {{A Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation}},
  author    = {Bhandari, Jalaj and Russo, Daniel and Singal, Raghav},
  booktitle = {Annual Conference on Computational Learning Theory},
  year      = {2018},
  pages     = {1691-1692},
  doi       = {10.1287/OPRE.2020.2024},
  url       = {https://mlanthology.org/colt/2018/bhandari2018colt-finite/}
}