On the Linear Convergence of Policy Gradient Methods for Finite MDPs

Abstract

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations. There has been some recent work viewing this setting as an instance of smooth non-linear optimization problems, to show sub-linear convergence rates with small step-sizes. Here, we take a completely different perspective based on illuminating connections with policy iteration, to show how many variants of policy gradient algorithms succeed with large step-sizes and attain a linear rate of convergence.

Cite

Text

Bhandari and Russo. "On the Linear Convergence of Policy Gradient Methods for Finite MDPs." Artificial Intelligence and Statistics, 2021.

Markdown

[Bhandari and Russo. "On the Linear Convergence of Policy Gradient Methods for Finite MDPs." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/bhandari2021aistats-linear/)

BibTeX

@inproceedings{bhandari2021aistats-linear,
  title     = {{On the Linear Convergence of Policy Gradient Methods for Finite MDPs}},
  author    = {Bhandari, Jalaj and Russo, Daniel},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2021},
  pages     = {2386-2394},
  volume    = {130},
  url       = {https://mlanthology.org/aistats/2021/bhandari2021aistats-linear/}
}