Bayesian Actor-Critic Algorithms

Ghavamzadeh, Mohammad; Engel, Yaakov

doi:10.1145/1273496.1273534

Bayesian Actor-Critic Algorithms

Mohammad Ghavamzadeh, Yaakov Engel

ICML 2007 pp. 297-304

doi:10.1145/1273496.1273534 /icml/2007/ghavamzadeh2007icml-bayesian/

Abstract

We1 present a new actor-critic learning model in which a Bayesian class of non-parametric critics, using Gaussian process temporal difference learning is used. Such critics model the state-action value function as a Gaussian process, allowing Bayes' rule to be used in computing the posterior distribution over state-action value functions, conditioned on the observed data. Appropriate choices of the prior covariance (kernel) between stateaction values and of the parametrization of the policy allow us to obtain closed-form expressions for the posterior distribution of the gradient of the average discounted return with respect to the policy parameters. The posterior mean, which serves as our estimate of the policy gradient, is used to update the policy, while the posterior covariance allows us to gauge the reliability of the update.

PDF ICML Semantic Scholar

Cite

Text

Ghavamzadeh and Engel. "Bayesian Actor-Critic Algorithms." International Conference on Machine Learning, 2007. doi:10.1145/1273496.1273534

Markdown

[Ghavamzadeh and Engel. "Bayesian Actor-Critic Algorithms." International Conference on Machine Learning, 2007.](https://mlanthology.org/icml/2007/ghavamzadeh2007icml-bayesian/) doi:10.1145/1273496.1273534

BibTeX

@inproceedings{ghavamzadeh2007icml-bayesian,
  title     = {{Bayesian Actor-Critic Algorithms}},
  author    = {Ghavamzadeh, Mohammad and Engel, Yaakov},
  booktitle = {International Conference on Machine Learning},
  year      = {2007},
  pages     = {297-304},
  doi       = {10.1145/1273496.1273534},
  url       = {https://mlanthology.org/icml/2007/ghavamzadeh2007icml-bayesian/}
}