USHER: Unbiased Sampling for Hindsight Experience Replay

Abstract

Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.

Cite

Text

Schramm et al. "USHER: Unbiased Sampling for Hindsight Experience Replay." Conference on Robot Learning, 2022.

Markdown

[Schramm et al. "USHER: Unbiased Sampling for Hindsight Experience Replay." Conference on Robot Learning, 2022.](https://mlanthology.org/corl/2022/schramm2022corl-usher/)

BibTeX

@inproceedings{schramm2022corl-usher,
  title     = {{USHER: Unbiased Sampling for Hindsight Experience Replay}},
  author    = {Schramm, Liam and Deng, Yunfu and Granados, Edgar and Boularias, Abdeslam},
  booktitle = {Conference on Robot Learning},
  year      = {2022},
  pages     = {2073-2082},
  volume    = {205},
  url       = {https://mlanthology.org/corl/2022/schramm2022corl-usher/}
}