Fast Deep Reinforcement Learning Using Online Adjustments from the past
Abstract
We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer. EVA shifts the value predicted by a neural network with an estimate of the value function found by prioritised sweeping over experience tuples from the replay buffer near the current state. EVA combines a number of recent ideas around combining episodic memory-like structures into reinforcement learning agents: slot-based storage, content-based retrieval, and memory-based planning. We show that EVA is performant on a demonstration task and Atari games.
Cite
Text
Hansen et al. "Fast Deep Reinforcement Learning Using Online Adjustments from the past." Neural Information Processing Systems, 2018.Markdown
[Hansen et al. "Fast Deep Reinforcement Learning Using Online Adjustments from the past." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/hansen2018neurips-fast/)BibTeX
@inproceedings{hansen2018neurips-fast,
title = {{Fast Deep Reinforcement Learning Using Online Adjustments from the past}},
author = {Hansen, Steven and Pritzel, Alexander and Sprechmann, Pablo and Barreto, Andre and Blundell, Charles},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {10567-10577},
url = {https://mlanthology.org/neurips/2018/hansen2018neurips-fast/}
}