Exploration-Guided Reward Shaping for Reinforcement Learning Under Sparse Rewards

Devidze, Rati; Kamalaruban, Parameswaran; Singla, Adish

Exploration-Guided Reward Shaping for Reinforcement Learning Under Sparse Rewards

Rati Devidze, Parameswaran Kamalaruban, Adish Singla

NeurIPS 2022

/neurips/2022/devidze2022neurips-explorationguided/

Abstract

We study the problem of reward shaping to accelerate the training process of a reinforcement learning agent. Existing works have considered a number of different reward shaping formulations; however, they either require external domain knowledge or fail in environments with extremely sparse rewards. In this paper, we propose a novel framework, Exploration-Guided Reward Shaping (ExploRS), that operates in a fully self-supervised manner and can accelerate an agent's learning even in sparse-reward environments. The key idea of ExploRS is to learn an intrinsic reward function in combination with exploration-based bonuses to maximize the agent's utility w.r.t. extrinsic rewards. We theoretically showcase the usefulness of our reward shaping framework in a special family of MDPs. Experimental results on several environments with sparse/noisy reward signals demonstrate the effectiveness of ExploRS.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Devidze et al. "Exploration-Guided Reward Shaping for Reinforcement Learning Under Sparse Rewards." Neural Information Processing Systems, 2022.

Markdown

[Devidze et al. "Exploration-Guided Reward Shaping for Reinforcement Learning Under Sparse Rewards." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/devidze2022neurips-explorationguided/)

BibTeX

@inproceedings{devidze2022neurips-explorationguided,
  title     = {{Exploration-Guided Reward Shaping for Reinforcement Learning Under Sparse Rewards}},
  author    = {Devidze, Rati and Kamalaruban, Parameswaran and Singla, Adish},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/devidze2022neurips-explorationguided/}
}