Reinforcement Learning with Gaussian Processes

Abstract

Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing two pressing issues, which were not adequately treated in the original GPTD paper (Engel et al., 2003). The first is the issue of stochasticity in the state transitions, and the second is concerned with action selection and policy improvement. We present a new generative model for the value function, deduced from its relation with the discounted return. We derive a corresponding on-line algorithm for learning the posterior moments of the value Gaussian process. We also present a SARSA based extension of GPTD, termed GPSARSA, that allows the selection of actions and the gradual improvement of policies without requiring a world-model.

Cite

Text

Engel et al. "Reinforcement Learning with Gaussian Processes." International Conference on Machine Learning, 2005. doi:10.1145/1102351.1102377

Markdown

[Engel et al. "Reinforcement Learning with Gaussian Processes." International Conference on Machine Learning, 2005.](https://mlanthology.org/icml/2005/engel2005icml-reinforcement/) doi:10.1145/1102351.1102377

BibTeX

@inproceedings{engel2005icml-reinforcement,
  title     = {{Reinforcement Learning with Gaussian Processes}},
  author    = {Engel, Yaakov and Mannor, Shie and Meir, Ron},
  booktitle = {International Conference on Machine Learning},
  year      = {2005},
  pages     = {201-208},
  doi       = {10.1145/1102351.1102377},
  url       = {https://mlanthology.org/icml/2005/engel2005icml-reinforcement/}
}