On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Tadic, Vladislav

doi:10.1023/A:1007609817671

On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Vladislav Tadic

MLJ 2001 pp. 241-267

doi:10.1023/A:1007609817671 /mlj/2001/tadic2001mlj-convergence/

Abstract

The asymptotic properties of temporal-difference learning algorithms with linear function approximation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain with an uncountable finite-dimensional state-space. Under mild conditions, the almost sure convergence of temporal-difference learning algorithms with linear function approximation is established and an upper bound for their asymptotic approximation error is determined. The obtained results are a generalization and extension of the existing results related to the asymptotic behavior of temporal-difference learning. Moreover, they cover cases to which the existing results cannot be applied, while the adopted assumptions seem to be the weakest possible under which the almost sure convergence of temporal-difference learning algorithms is still possible to be demonstrated.

PDF MLJ Semantic Scholar

Cite

Text

Tadic. "On the Convergence of Temporal-Difference Learning with Linear Function Approximation." Machine Learning, 2001. doi:10.1023/A:1007609817671

Markdown

[Tadic. "On the Convergence of Temporal-Difference Learning with Linear Function Approximation." Machine Learning, 2001.](https://mlanthology.org/mlj/2001/tadic2001mlj-convergence/) doi:10.1023/A:1007609817671

BibTeX

@article{tadic2001mlj-convergence,
  title     = {{On the Convergence of Temporal-Difference Learning with Linear Function Approximation}},
  author    = {Tadic, Vladislav},
  journal   = {Machine Learning},
  year      = {2001},
  pages     = {241-267},
  doi       = {10.1023/A:1007609817671},
  volume    = {42},
  url       = {https://mlanthology.org/mlj/2001/tadic2001mlj-convergence/}
}