Fast Deep Reinforcement Learning Using Online Adjustments from the past

Abstract

We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer. EVA shifts the value predicted by a neural network with an estimate of the value function found by prioritised sweeping over experience tuples from the replay buffer near the current state. EVA combines a number of recent ideas around combining episodic memory-like structures into reinforcement learning agents: slot-based storage, content-based retrieval, and memory-based planning. We show that EVA is performant on a demonstration task and Atari games.

Cite

Text

Hansen et al. "Fast Deep Reinforcement Learning Using Online Adjustments from the past." Neural Information Processing Systems, 2018.

Markdown

[Hansen et al. "Fast Deep Reinforcement Learning Using Online Adjustments from the past." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/hansen2018neurips-fast/)

BibTeX

@inproceedings{hansen2018neurips-fast,
  title     = {{Fast Deep Reinforcement Learning Using Online Adjustments from the past}},
  author    = {Hansen, Steven and Pritzel, Alexander and Sprechmann, Pablo and Barreto, Andre and Blundell, Charles},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {10567-10577},
  url       = {https://mlanthology.org/neurips/2018/hansen2018neurips-fast/}
}