On the Global Convergence of Fitted Q-Iteration with Two-Layer Neural Network Parametrization

Abstract

Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.

Cite

Text

Gaur et al. "On the Global Convergence of Fitted Q-Iteration with Two-Layer Neural Network Parametrization." International Conference on Machine Learning, 2023.

Markdown

[Gaur et al. "On the Global Convergence of Fitted Q-Iteration with Two-Layer Neural Network Parametrization." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/gaur2023icml-global/)

BibTeX

@inproceedings{gaur2023icml-global,
  title     = {{On the Global Convergence of Fitted Q-Iteration with Two-Layer Neural Network Parametrization}},
  author    = {Gaur, Mudit and Aggarwal, Vaneet and Agarwal, Mridul},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {11013-11049},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/gaur2023icml-global/}
}