Goal-Conditioned Offline Planning from Curious Exploration

Abstract

Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments.

Cite

Text

Bagatella and Martius. "Goal-Conditioned Offline Planning from Curious Exploration." Neural Information Processing Systems, 2023.

Markdown

[Bagatella and Martius. "Goal-Conditioned Offline Planning from Curious Exploration." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/bagatella2023neurips-goalconditioned/)

BibTeX

@inproceedings{bagatella2023neurips-goalconditioned,
  title     = {{Goal-Conditioned Offline Planning from Curious Exploration}},
  author    = {Bagatella, Marco and Martius, Georg},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/bagatella2023neurips-goalconditioned/}
}