Real-Time Recurrent Reinforcement Learning

Abstract

We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.

Cite

Text

Lemmel and Grosu. "Real-Time Recurrent Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I17.34001

Markdown

[Lemmel and Grosu. "Real-Time Recurrent Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/lemmel2025aaai-real/) doi:10.1609/AAAI.V39I17.34001

BibTeX

@inproceedings{lemmel2025aaai-real,
  title     = {{Real-Time Recurrent Reinforcement Learning}},
  author    = {Lemmel, Julian and Grosu, Radu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {18189-18197},
  doi       = {10.1609/AAAI.V39I17.34001},
  url       = {https://mlanthology.org/aaai/2025/lemmel2025aaai-real/}
}