Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Abstract

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor- critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.

Cite

Text

Yang et al. "Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost." Neural Information Processing Systems, 2019.

Markdown

[Yang et al. "Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/yang2019neurips-provably/)

BibTeX

@inproceedings{yang2019neurips-provably,
  title     = {{Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost}},
  author    = {Yang, Zhuoran and Chen, Yongxin and Hong, Mingyi and Wang, Zhaoran},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {8353-8365},
  url       = {https://mlanthology.org/neurips/2019/yang2019neurips-provably/}
}