Reward Prediction Error as an Exploration Objective in Deep RL
Abstract
A major challenge in reinforcement learning is exploration, when local dithering methods such as epsilon-greedy sampling are insufficient to solve a given task. Many recent methods have proposed to intrinsically motivate an agent to seek novel states, driving the agent to discover improved reward. However, while state-novelty exploration methods are suitable for tasks where novel observations correlate well with improved reward, they may not explore more efficiently than epsilon-greedy approaches in environments where the two are not well-correlated. In this paper, we distinguish between exploration tasks in which seeking novel states aids in finding new reward, and those where it does not, such as goal-conditioned tasks and escaping local reward maxima. We propose a new exploration objective, maximizing the reward prediction error (RPE) of a value function trained to predict extrinsic reward. We then propose a deep reinforcement learning method, QXplore, which exploits the temporal difference error of a Q-function to solve hard exploration tasks in high-dimensional MDPs. We demonstrate the exploration behavior of QXplore on several OpenAI Gym MuJoCo tasks and Atari games and observe that QXplore is comparable to or better than a baseline state-novelty method in all cases, outperforming the baseline on tasks where state novelty is not well-correlated with improved reward.
Cite
Text
Simmons-Edler et al. "Reward Prediction Error as an Exploration Objective in Deep RL." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/390Markdown
[Simmons-Edler et al. "Reward Prediction Error as an Exploration Objective in Deep RL." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/simmonsedler2020ijcai-reward/) doi:10.24963/IJCAI.2020/390BibTeX
@inproceedings{simmonsedler2020ijcai-reward,
title = {{Reward Prediction Error as an Exploration Objective in Deep RL}},
author = {Simmons-Edler, Riley and Eisner, Ben and Yang, Daniel and Bisulco, Anthony and Mitchell, Eric and Seung, H. Sebastian and Lee, Daniel D.},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2020},
pages = {2816-2823},
doi = {10.24963/IJCAI.2020/390},
url = {https://mlanthology.org/ijcai/2020/simmonsedler2020ijcai-reward/}
}