On the Distributional Convergence of Temporal Difference Learning
Abstract
Temporal Difference (TD) learning is one of the most simple but efficient algorithms for policy evaluation in reinforcement learning. Although the finite-time convergence results of the TD algorithm are abundant now, the distributional convergence has still been blank. This paper shows that TD with constant step size simulates Markov chains converging to some stationary distribution under both i.i.d. and Markov chain observation models. We prove that TD enjoys the geometric distributional convergence rate and show how the step size affects the expectation and covariance of the stationary distribution. All assumptions used in our paper are mild and common in the TD community. Our proved results indicate a tradeoff between the convergence speed and accuracy for TD. Based on our theoretical findings, we explain why the Jacobi preconditioner can accelerate the TD algorithms.
Cite
Text
Dai and Chen. "On the Distributional Convergence of Temporal Difference Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43421-1_26Markdown
[Dai and Chen. "On the Distributional Convergence of Temporal Difference Learning." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/dai2023ecmlpkdd-distributional/) doi:10.1007/978-3-031-43421-1_26BibTeX
@inproceedings{dai2023ecmlpkdd-distributional,
title = {{On the Distributional Convergence of Temporal Difference Learning}},
author = {Dai, Jie and Chen, Xuguang},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2023},
pages = {439-454},
doi = {10.1007/978-3-031-43421-1_26},
url = {https://mlanthology.org/ecmlpkdd/2023/dai2023ecmlpkdd-distributional/}
}