Bayesian Policy Gradient Algorithms

NeurIPS 2006 pp. 457-464

/neurips/2006/ghavamzadeh2006neurips-bayesian/

Abstract

Policy gradient methods are reinforcement learning algorithms that adapt a param- eterized policy by following a performance gradient estimate. Conventional pol- icy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

PDF NeurIPS Semantic Scholar

Cite

Text

Ghavamzadeh and Engel. "Bayesian Policy Gradient Algorithms." Neural Information Processing Systems, 2006.

Markdown

[Ghavamzadeh and Engel. "Bayesian Policy Gradient Algorithms." Neural Information Processing Systems, 2006.](https://mlanthology.org/neurips/2006/ghavamzadeh2006neurips-bayesian/)

BibTeX

@inproceedings{ghavamzadeh2006neurips-bayesian,
  title     = {{Bayesian Policy Gradient Algorithms}},
  author    = {Ghavamzadeh, Mohammad and Engel, Yaakov},
  booktitle = {Neural Information Processing Systems},
  year      = {2006},
  pages     = {457-464},
  url       = {https://mlanthology.org/neurips/2006/ghavamzadeh2006neurips-bayesian/}
}