BLAST: Latent Dynamics Models from Bootstrapping
Abstract
State-of-the-art world models such as DreamerV2 have significantly improved the capabilities of model-based reinforcement learning. However, these approaches typically rely on a reconstruction loss to shape their latent representations, which is known to fail in environments with high fidelity visual observations. Previous work has found that when learning latent dynamics models without a reconstruction loss by using only the signal provided by the reward, the performance can also drop dramatically. We present a simple set of modification to DreamerV2 to remove its reliance on reconstruction inspired by the recent self-supervised learning method Bootstrap Your Own Latent. The combination of adding a stop-gradient to the posterior, using a powerful auto-regressive model for the prior, and using a slowly updating target encoder, which we call BLAST, allows the world model to learn from signals present in both the reward and observations, improving efficiency on our tested environment as well as being significantly more robust to visual distractors.
Cite
Text
Paster et al. "BLAST: Latent Dynamics Models from Bootstrapping." NeurIPS 2021 Workshops: DeepRL, 2021.Markdown
[Paster et al. "BLAST: Latent Dynamics Models from Bootstrapping." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/paster2021neuripsw-blast/)BibTeX
@inproceedings{paster2021neuripsw-blast,
title = {{BLAST: Latent Dynamics Models from Bootstrapping}},
author = {Paster, Keiran and McKinney, Lev E and McIlraith, Sheila A. and Ba, Jimmy},
booktitle = {NeurIPS 2021 Workshops: DeepRL},
year = {2021},
url = {https://mlanthology.org/neuripsw/2021/paster2021neuripsw-blast/}
}