USHER: Unbiased Sampling for Hindsight Experience Replay
Abstract
Dealing with sparse rewards is a long-standing challenge in reinforcement learning (RL). Hindsight Experience Replay (HER) addresses this problem by reusing failed trajectories for one goal as successful trajectories for another. This allows for both a minimum density of reward and for generalization across multiple goals. However, this strategy is known to result in a biased value function, as the update rule underestimates the likelihood of bad outcomes in a stochastic environment. We propose an asymptotically unbiased importance-sampling-based algorithm to address this problem without sacrificing performance on deterministic environments. We show its effectiveness on a range of robotic systems, including challenging high dimensional stochastic environments.
Cite
Text
Schramm et al. "USHER: Unbiased Sampling for Hindsight Experience Replay." Conference on Robot Learning, 2022.Markdown
[Schramm et al. "USHER: Unbiased Sampling for Hindsight Experience Replay." Conference on Robot Learning, 2022.](https://mlanthology.org/corl/2022/schramm2022corl-usher/)BibTeX
@inproceedings{schramm2022corl-usher,
title = {{USHER: Unbiased Sampling for Hindsight Experience Replay}},
author = {Schramm, Liam and Deng, Yunfu and Granados, Edgar and Boularias, Abdeslam},
booktitle = {Conference on Robot Learning},
year = {2022},
pages = {2073-2082},
volume = {205},
url = {https://mlanthology.org/corl/2022/schramm2022corl-usher/}
}