VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Zintgraf, Luisa; Shiarlis, Kyriacos; Igl, Maximilian; Schulze, Sebastian; Gal, Yarin; Hofmann, Katja; Whiteson, Shimon

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson

ICLR 2020

/iclr/2020/zintgraf2020iclr-varibad/

Abstract

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncer- tainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods.

PDF ICLR Semantic Scholar

Cite

Text

Zintgraf et al. "VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning." International Conference on Learning Representations, 2020.

Markdown

[Zintgraf et al. "VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/zintgraf2020iclr-varibad/)

BibTeX

@inproceedings{zintgraf2020iclr-varibad,
  title     = {{VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning}},
  author    = {Zintgraf, Luisa and Shiarlis, Kyriacos and Igl, Maximilian and Schulze, Sebastian and Gal, Yarin and Hofmann, Katja and Whiteson, Shimon},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/zintgraf2020iclr-varibad/}
}