Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models

Abstract

Model-based reinforcement learning (RL) algorithms can attain excellent sample efficiency, but often lag behind the best model-free algorithms in terms of asymptotic performance. This is especially true with high-capacity parametric function approximators, such as deep networks. In this paper, we study how to bridge this gap, by employing uncertainty-aware dynamics models. We propose a new algorithm called probabilistic ensembles with trajectory sampling (PETS) that combines uncertainty-aware deep network dynamics models with sampling-based uncertainty propagation. Our comparison to state-of-the-art model-based and model-free deep RL algorithms shows that our approach matches the asymptotic performance of model-free algorithms on several challenging benchmark tasks, while requiring significantly fewer samples (e.g. 8 and 125 times fewer samples than Soft Actor Critic and Proximal Policy Optimization respectively on the half-cheetah task).

Cite

Text

Chua et al. "Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models." Neural Information Processing Systems, 2018.

Markdown

[Chua et al. "Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/chua2018neurips-deep/)

BibTeX

@inproceedings{chua2018neurips-deep,
  title     = {{Deep Reinforcement Learning in a Handful of Trials Using Probabilistic Dynamics Models}},
  author    = {Chua, Kurtland and Calandra, Roberto and McAllister, Rowan and Levine, Sergey},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {4754-4765},
  url       = {https://mlanthology.org/neurips/2018/chua2018neurips-deep/}
}