Muesli: Combining Improvements in Policy Optimization

Abstract

We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero’s state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.

Cite

Text

Hessel et al. "Muesli: Combining Improvements in Policy Optimization." International Conference on Machine Learning, 2021.

Markdown

[Hessel et al. "Muesli: Combining Improvements in Policy Optimization." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/hessel2021icml-muesli/)

BibTeX

@inproceedings{hessel2021icml-muesli,
  title     = {{Muesli: Combining Improvements in Policy Optimization}},
  author    = {Hessel, Matteo and Danihelka, Ivo and Viola, Fabio and Guez, Arthur and Schmitt, Simon and Sifre, Laurent and Weber, Theophane and Silver, David and Van Hasselt, Hado},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {4214-4226},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/hessel2021icml-muesli/}
}