Momentum-Based Policy Gradient Methods
Abstract
In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary point of the nonconcave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of $O(\epsilon^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.
Cite
Text
Huang et al. "Momentum-Based Policy Gradient Methods." International Conference on Machine Learning, 2020.Markdown
[Huang et al. "Momentum-Based Policy Gradient Methods." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/huang2020icml-momentumbased/)BibTeX
@inproceedings{huang2020icml-momentumbased,
title = {{Momentum-Based Policy Gradient Methods}},
author = {Huang, Feihu and Gao, Shangqian and Pei, Jian and Huang, Heng},
booktitle = {International Conference on Machine Learning},
year = {2020},
pages = {4422-4433},
volume = {119},
url = {https://mlanthology.org/icml/2020/huang2020icml-momentumbased/}
}