On the Performance of Temporal Difference Learning with Neural Networks
Abstract
Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis of Neural TD Learning with a projection onto $B(\theta_0, \omega)$, a ball of fixed radius $\omega$ around the initial point $\theta_0$. We show an approximation bound of $O(\epsilon + 1/\sqrt{m})$ where $\epsilon$ is the approximation quality of the best neural network in $B(\theta_0, \omega)$ and $m$ is the width of all hidden layers in the network.
Cite
Text
Tian et al. "On the Performance of Temporal Difference Learning with Neural Networks." International Conference on Learning Representations, 2023.Markdown
[Tian et al. "On the Performance of Temporal Difference Learning with Neural Networks." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/tian2023iclr-performance/)BibTeX
@inproceedings{tian2023iclr-performance,
title = {{On the Performance of Temporal Difference Learning with Neural Networks}},
author = {Tian, Haoxing and Paschalidis, Ioannis and Olshevsky, Alex},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/tian2023iclr-performance/}
}