Real-Time Recurrent Reinforcement Learning
Abstract
We introduce a biologically plausible RL framework for solving tasks in partially observable Markov decision processes (POMDPs). The proposed algorithm combines three integral parts: (1) A Meta-RL architecture, resembling the mammalian basal ganglia; (2) A biologically plausible reinforcement learning algorithm, exploiting temporal difference learning and eligibility traces to train the policy and the value-function; (3) An online automatic differentiation algorithm for computing the gradients with respect to parameters of a shared recurrent network backbone. Our experimental results show that the method is capable of solving a diverse set of partially observable reinforcement learning tasks. The algorithm we call real-time recurrent reinforcement learning (RTRRL) serves as a model of learning in biological neural networks, mimicking reward pathways in the basal ganglia.
Cite
Text
Lemmel and Grosu. "Real-Time Recurrent Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I17.34001Markdown
[Lemmel and Grosu. "Real-Time Recurrent Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/lemmel2025aaai-real/) doi:10.1609/AAAI.V39I17.34001BibTeX
@inproceedings{lemmel2025aaai-real,
title = {{Real-Time Recurrent Reinforcement Learning}},
author = {Lemmel, Julian and Grosu, Radu},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {18189-18197},
doi = {10.1609/AAAI.V39I17.34001},
url = {https://mlanthology.org/aaai/2025/lemmel2025aaai-real/}
}