On the Performance of Temporal Difference Learning with Neural Networks

Abstract

Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis of Neural TD Learning with a projection onto $B(\theta_0, \omega)$, a ball of fixed radius $\omega$ around the initial point $\theta_0$. We show an approximation bound of $O(\epsilon + 1/\sqrt{m})$ where $\epsilon$ is the approximation quality of the best neural network in $B(\theta_0, \omega)$ and $m$ is the width of all hidden layers in the network.

Cite

Text

Tian et al. "On the Performance of Temporal Difference Learning with Neural Networks." International Conference on Learning Representations, 2023.

Markdown

[Tian et al. "On the Performance of Temporal Difference Learning with Neural Networks." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/tian2023iclr-performance/)

BibTeX

@inproceedings{tian2023iclr-performance,
  title     = {{On the Performance of Temporal Difference Learning with Neural Networks}},
  author    = {Tian, Haoxing and Paschalidis, Ioannis and Olshevsky, Alex},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/tian2023iclr-performance/}
}