Convergence of Actor-Critic with Multi-Layer Neural Networks

Abstract

The early theory of actor-critic methods considered convergence using linear function approximators for the policy and value functions. Recent work has established convergence using neural network approximators with a single hidden layer. In this work we are taking the natural next step and establish convergence using deep neural networks with an arbitrary number of hidden layers, thus closing a gap between theory and practice. We show that actor-critic updates projected on a ball around the initial condition will converge to a neighborhood where the average of the squared gradients is $\tilde{O} \left( 1/\sqrt{m} \right) + O \left( \epsilon \right)$, with $m$ being the width of the neural network and $\epsilon$ the approximation quality of the best critic neural network over the projected set.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Tian et al. "Convergence of Actor-Critic with Multi-Layer Neural Networks." Neural Information Processing Systems, 2023.

Markdown

[Tian et al. "Convergence of Actor-Critic with Multi-Layer Neural Networks." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/tian2023neurips-convergence/)

BibTeX

@inproceedings{tian2023neurips-convergence,
  title     = {{Convergence of Actor-Critic with Multi-Layer Neural Networks}},
  author    = {Tian, Haoxing and Olshevsky, Alex and Paschalidis, Yannis},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/tian2023neurips-convergence/}
}