On the Distributional Convergence of Temporal Difference Learning

Dai, Jie; Chen, Xuguang

doi:10.1007/978-3-031-43421-1_26

On the Distributional Convergence of Temporal Difference Learning

Jie Dai, Xuguang Chen

ECML-PKDD 2023 pp. 439-454

doi:10.1007/978-3-031-43421-1_26 /ecmlpkdd/2023/dai2023ecmlpkdd-distributional/

Abstract

Temporal Difference (TD) learning is one of the most simple but efficient algorithms for policy evaluation in reinforcement learning. Although the finite-time convergence results of the TD algorithm are abundant now, the distributional convergence has still been blank. This paper shows that TD with constant step size simulates Markov chains converging to some stationary distribution under both i.i.d. and Markov chain observation models. We prove that TD enjoys the geometric distributional convergence rate and show how the step size affects the expectation and covariance of the stationary distribution. All assumptions used in our paper are mild and common in the TD community. Our proved results indicate a tradeoff between the convergence speed and accuracy for TD. Based on our theoretical findings, we explain why the Jacobi preconditioner can accelerate the TD algorithms.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Dai and Chen. "On the Distributional Convergence of Temporal Difference Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_26

Markdown

[Dai and Chen. "On the Distributional Convergence of Temporal Difference Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/dai2023ecmlpkdd-distributional/) doi:10.1007/978-3-031-43421-1_26

BibTeX

@inproceedings{dai2023ecmlpkdd-distributional,
  title     = {{On the Distributional Convergence of Temporal Difference Learning}},
  author    = {Dai, Jie and Chen, Xuguang},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {439-454},
  doi       = {10.1007/978-3-031-43421-1_26},
  url       = {https://mlanthology.org/ecmlpkdd/2023/dai2023ecmlpkdd-distributional/}
}