An Improved Finite-Time Analysis of Temporal Difference Learning with Deep Neural Networks

Abstract

Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these algorithms remains challenging due to the nonlinearity of the action-value approximation. In this paper, we develop an improved non-asymptotic analysis of the neural TD method with a general $L$-layer neural network. New proof techniques are developed and an improved new $\tilde{\mathcal{O}}(\epsilon^{-1})$ sample complexity is derived. To our best knowledge, this is the first finite-time analysis of neural TD that achieves an $\tilde{\mathcal{O}}(\epsilon^{-1})$ complexity under the Markovian sampling, as opposed to the best known $\tilde{\mathcal{O}}(\epsilon^{-2})$ complexity in the existing literature.

Cite

Text

Ke et al. "An Improved Finite-Time Analysis of Temporal Difference Learning with Deep Neural Networks." International Conference on Machine Learning, 2024.

Markdown

[Ke et al. "An Improved Finite-Time Analysis of Temporal Difference Learning with Deep Neural Networks." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/ke2024icml-improved/)

BibTeX

@inproceedings{ke2024icml-improved,
  title     = {{An Improved Finite-Time Analysis of Temporal Difference Learning with Deep Neural Networks}},
  author    = {Ke, Zhifa and Wen, Zaiwen and Zhang, Junyu},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {23407-23429},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/ke2024icml-improved/}
}