Reinforcement Learning with Long Short-Term Memory

Abstract

This paper presents reinforcement learning with a Long Short(cid:173) Term Memory recurrent neural network: RL-LSTM. Model-free RL-LSTM using Advantage(,x) learning and directed exploration can solve non-Markovian tasks with long-term dependencies be(cid:173) tween relevant events. This is demonstrated in a T-maze task, as well as in a difficult variation of the pole balancing task.

Cite

Text

Bakker. "Reinforcement Learning with Long Short-Term Memory." Neural Information Processing Systems, 2001.

Markdown

[Bakker. "Reinforcement Learning with Long Short-Term Memory." Neural Information Processing Systems, 2001.](https://mlanthology.org/neurips/2001/bakker2001neurips-reinforcement/)

BibTeX

@inproceedings{bakker2001neurips-reinforcement,
  title     = {{Reinforcement Learning with Long Short-Term Memory}},
  author    = {Bakker, Bram},
  booktitle = {Neural Information Processing Systems},
  year      = {2001},
  pages     = {1475-1482},
  url       = {https://mlanthology.org/neurips/2001/bakker2001neurips-reinforcement/}
}