Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
Abstract
Meta-reinforcement learning (meta-RL) has proven to be a successful framework for leveraging experience from prior tasks to rapidly learn new related tasks, however, current meta-RL approaches struggle to learn in sparse reward environments. Although existing meta-RL algorithms can learn strategies for adapting to new sparse reward tasks, the actual adaptation strategies are learned using hand-shaped reward functions, or require simple environments where random exploration is sufficient to encounter sparse reward. In this paper we present a formulation of hindsight relabelling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward. We demonstrate the effectiveness of our approach on a suite of challenging sparse reward environments that previously required dense reward during meta-training to solve. Our approach solves these environments using the true sparse reward function, with performance comparable to training with a proxy dense reward function.
Cite
Text
Packer et al. "Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL." Neural Information Processing Systems, 2021.Markdown
[Packer et al. "Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/packer2021neurips-hindsight/)BibTeX
@inproceedings{packer2021neurips-hindsight,
title = {{Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL}},
author = {Packer, Charles and Abbeel, Pieter and Gonzalez, Joseph E},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/packer2021neurips-hindsight/}
}