Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks

Abstract

We present an algorithm for model-based reinforcement learning that combines Bayesian neural networks (BNNs) with random roll-outs and stochastic optimization for policy learning. The BNNs are trained by minimizing $\alpha$-divergences, allowing us to capture complicated statistical patterns in the transition dynamics, e.g. multi-modality and heteroskedasticity, which are usually missed by other common modeling approaches. We illustrate the performance of our method by solving a challenging benchmark where model-based approaches usually fail and by obtaining promising results in a real-world scenario for controlling a gas turbine.

Cite

Text

Depeweg et al. "Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks." International Conference on Learning Representations, 2017.

Markdown

[Depeweg et al. "Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks." International Conference on Learning Representations, 2017.](https://mlanthology.org/iclr/2017/depeweg2017iclr-learning/)

BibTeX

@inproceedings{depeweg2017iclr-learning,
  title     = {{Learning and Policy Search in Stochastic Dynamical Systems with Bayesian Neural Networks}},
  author    = {Depeweg, Stefan and Hernández-Lobato, José Miguel and Doshi-Velez, Finale and Udluft, Steffen},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
  url       = {https://mlanthology.org/iclr/2017/depeweg2017iclr-learning/}
}