Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies

Abstract

We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class. Using the compatible function approximation framework, both methods with log-linear policies can be written as approximate versions of the policy mirror descent (PMD) method. We show that both methods attain linear convergence rates and $\tilde{\mathcal{O}}(1/\epsilon^2)$ sample complexities using a simple, non-adaptive geometrically increasing step size, without resorting to entropy or other strongly convex regularization. Lastly, as a byproduct, we obtain sublinear convergence rates for both methods with arbitrary constant step size.

Cite

Text

Yuan et al. "Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies." International Conference on Learning Representations, 2023.

Markdown

[Yuan et al. "Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/yuan2023iclr-linear/)

BibTeX

@inproceedings{yuan2023iclr-linear,
  title     = {{Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies}},
  author    = {Yuan, Rui and Du, Simon Shaolei and Gower, Robert M. and Lazaric, Alessandro and Xiao, Lin},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/yuan2023iclr-linear/}
}