PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Abstract

In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.

Cite

Text

Deisenroth and Rasmussen. "PILCO: A Model-Based and Data-Efficient Approach to Policy Search." International Conference on Machine Learning, 2011.

Markdown

[Deisenroth and Rasmussen. "PILCO: A Model-Based and Data-Efficient Approach to Policy Search." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/deisenroth2011icml-pilco/)

BibTeX

@inproceedings{deisenroth2011icml-pilco,
  title     = {{PILCO: A Model-Based and Data-Efficient Approach to Policy Search}},
  author    = {Deisenroth, Marc Peter and Rasmussen, Carl Edward},
  booktitle = {International Conference on Machine Learning},
  year      = {2011},
  pages     = {465-472},
  url       = {https://mlanthology.org/icml/2011/deisenroth2011icml-pilco/}
}