Asymptotic Analysis of Temporal-Difference Learning Algorithms with Constant Step-Sizes

Abstract

The mean-square asymptotic behavior of temporal-difference learning algorithms with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M / G/ 1 queues and nonlinear AR models with Markov switching.

Cite

Text

Tadic. "Asymptotic Analysis of Temporal-Difference Learning Algorithms with Constant Step-Sizes." Machine Learning, 2006. doi:10.1007/S10994-006-5835-Z

Markdown

[Tadic. "Asymptotic Analysis of Temporal-Difference Learning Algorithms with Constant Step-Sizes." Machine Learning, 2006.](https://mlanthology.org/mlj/2006/tadic2006mlj-asymptotic/) doi:10.1007/S10994-006-5835-Z

BibTeX

@article{tadic2006mlj-asymptotic,
  title     = {{Asymptotic Analysis of Temporal-Difference Learning Algorithms with Constant Step-Sizes}},
  author    = {Tadic, Vladislav B.},
  journal   = {Machine Learning},
  year      = {2006},
  pages     = {107-133},
  doi       = {10.1007/S10994-006-5835-Z},
  volume    = {63},
  url       = {https://mlanthology.org/mlj/2006/tadic2006mlj-asymptotic/}
}