BLaDE: Robust Exploration via Diffusion Models

Abstract

We present Bootstrap your own Latents with Diffusion models for Exploration (BLaDE), a general approach for curiosity-driven exploration in complex, partially-observable and stochastic environments. BLaDE is a natural extension of Bootstrap Your Own Latents for Exploration (BYOL-Explore) which is a multi-step prediction-error method at the latent level that learns a world representation, the world dynamics, and provides an intrinsic-reward all-together by optimizing a single prediction loss with no additional auxiliary objective. Contrary to BYOL-Explore that predicts future latents from past latents and future open-loop actions, BLaDE predicts, via a diffusion model, future latents from past observations, future open-loop actions and a noisy version of future latents. Leaking information about future latents allows to control the variance of the distribution of future latents which makes the method agnostic to stochastic traps. Our experiments on different noisy versions of Montezuma’s Revenge show that BLaDE handles stochasticity better than Random Network Distillation, Intrinsic Curiosity Module and BYOL-Explore without degrading the performance of BYOL-Explore in the non-noisy and fairly deterministic Montezuma’s Revenge.

Cite

Text

Piot et al. "BLaDE: Robust Exploration via Diffusion Models." NeurIPS 2022 Workshops: DeepRL, 2022.

Markdown

[Piot et al. "BLaDE: Robust Exploration via Diffusion Models." NeurIPS 2022 Workshops: DeepRL, 2022.](https://mlanthology.org/neuripsw/2022/piot2022neuripsw-blade/)

BibTeX

@inproceedings{piot2022neuripsw-blade,
  title     = {{BLaDE: Robust Exploration via Diffusion Models}},
  author    = {Piot, Bilal and Guo, Zhaohan Daniel and Thakoor, Shantanu and Azar, Mohammad Gheshlaghi},
  booktitle = {NeurIPS 2022 Workshops: DeepRL},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/piot2022neuripsw-blade/}
}