Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Abstract

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.

Cite

Text

Yu et al. "Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation." Neural Information Processing Systems, 2005.

Markdown

[Yu et al. "Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation." Neural Information Processing Systems, 2005.](https://mlanthology.org/neurips/2005/yu2005neurips-fast/)

BibTeX

@inproceedings{yu2005neurips-fast,
  title     = {{Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation}},
  author    = {Yu, Jin and Aberdeen, Douglas and Schraudolph, Nicol N.},
  booktitle = {Neural Information Processing Systems},
  year      = {2005},
  pages     = {1185-1192},
  url       = {https://mlanthology.org/neurips/2005/yu2005neurips-fast/}
}