Unlocking the Power of Representations in Long-Term Novelty-Based Exploration
Abstract
We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with \DETOCS achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!"
Cite
Text
Saade et al. "Unlocking the Power of Representations in Long-Term Novelty-Based Exploration." International Conference on Learning Representations, 2024.Markdown
[Saade et al. "Unlocking the Power of Representations in Long-Term Novelty-Based Exploration." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/saade2024iclr-unlocking/)BibTeX
@inproceedings{saade2024iclr-unlocking,
title = {{Unlocking the Power of Representations in Long-Term Novelty-Based Exploration}},
author = {Saade, Alaa and Kapturowski, Steven and Calandriello, Daniele and Blundell, Charles and Sprechmann, Pablo and Sarra, Leopoldo and Groth, Oliver and Valko, Michal and Piot, Bilal},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/saade2024iclr-unlocking/}
}