Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation: Tail Averaging and Regularisation

Abstract

We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm, when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal O (1/t) rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation, and show that this variant fares favourably in problems with ill-conditioned features.

Cite

Text

Patil et al. "Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation: Tail Averaging and Regularisation." Artificial Intelligence and Statistics, 2023.

Markdown

[Patil et al. "Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation: Tail Averaging and Regularisation." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/patil2023aistats-finite/)

BibTeX

@inproceedings{patil2023aistats-finite,
  title     = {{Finite Time Analysis of Temporal Difference Learning with Linear Function Approximation: Tail Averaging and Regularisation}},
  author    = {Patil, Gandharv and L.A., Prashanth and Nagaraj, Dheeraj and Precup, Doina},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {5438-5448},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/patil2023aistats-finite/}
}