An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods
Abstract
In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied on NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Our analysis carefully integrates the advantages of these two lines of works. Thanks to this improvement, we have also made variance-reduction for NPG possible for the first time, with both global convergence and an efficient finite-sample complexity.
Cite
Text
Liu et al. "An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods." Neural Information Processing Systems, 2020.Markdown
[Liu et al. "An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/liu2020neurips-improved/)BibTeX
@inproceedings{liu2020neurips-improved,
title = {{An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods}},
author = {Liu, Yanli and Zhang, Kaiqing and Basar, Tamer and Yin, Wotao},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/liu2020neurips-improved/}
}