A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation

ICML 2020 pp. 10555-10565

/icml/2020/xu2020icml-finitetime/

Abstract

Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtually unknown. In this paper, we present a finite-time analysis of a neural Q-learning algorithm, where the data are generated from a Markov decision process, and the action-value function is approximated by a deep ReLU neural network. We prove that neural Q-learning finds the optimal policy with an $O(1/\sqrt{T})$ convergence rate if the neural function approximator is sufficiently overparameterized, where $T$ is the number of iterations. To our best knowledge, our result is the first finite-time analysis of neural Q-learning under non-i.i.d. data assumption.

PDF ICML Semantic Scholar

Cite

Text

Xu and Gu. "A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation." International Conference on Machine Learning, 2020.

Markdown

[Xu and Gu. "A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/xu2020icml-finitetime/)

BibTeX

@inproceedings{xu2020icml-finitetime,
  title     = {{A Finite-Time Analysis of Q-Learning with Neural Network Function Approximation}},
  author    = {Xu, Pan and Gu, Quanquan},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {10555-10565},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/xu2020icml-finitetime/}
}