Muesli: Combining Improvements in Policy Optimization
Abstract
We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero’s state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.
Cite
Text
Hessel et al. "Muesli: Combining Improvements in Policy Optimization." International Conference on Machine Learning, 2021.Markdown
[Hessel et al. "Muesli: Combining Improvements in Policy Optimization." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/hessel2021icml-muesli/)BibTeX
@inproceedings{hessel2021icml-muesli,
title = {{Muesli: Combining Improvements in Policy Optimization}},
author = {Hessel, Matteo and Danihelka, Ivo and Viola, Fabio and Guez, Arthur and Schmitt, Simon and Sifre, Laurent and Weber, Theophane and Silver, David and Van Hasselt, Hado},
booktitle = {International Conference on Machine Learning},
year = {2021},
pages = {4214-4226},
volume = {139},
url = {https://mlanthology.org/icml/2021/hessel2021icml-muesli/}
}