Recurrent Action Transformer with Memory
Abstract
Transformers have become increasingly popular in offline reinforcement learning (RL) due to their ability to treat agent trajectories as sequences, reframing policy learning as a sequence modeling task. However, in partially observable environments (POMDPs), effective decision-making depends on retaining information about past events - something that standard transformers struggle with due to the quadratic complexity of self-attention, which limits their context length. One solution to this problem is to extend transformers with memory mechanisms. We propose the Recurrent Action Transformer with Memory (RATE), a novel transformer-based architecture for offline RL that incorporates a recurrent memory mechanism designed to regulate information retention. We evaluate RATE across a diverse set of environments: memory-intensive tasks (ViZDoom-Two-Colors, T-Maze, Memory Maze, Minigrid-Memory, and POPGym), as well as standard Atari and MuJoCo benchmarks. Our comprehensive experiments demonstrate that RATE significantly improves performance in memory-dependent settings while remaining competitive on standard tasks across a broad range of baselines. These findings underscore the pivotal role of integrated memory mechanisms in offline RL and establish RATE as a unified, high-capacity architecture for effective decision-making over extended horizons.
Cite
Text
Cherepanov et al. "Recurrent Action Transformer with Memory." International Conference on Learning Representations, 2026.Markdown
[Cherepanov et al. "Recurrent Action Transformer with Memory." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/cherepanov2026iclr-recurrent/)BibTeX
@inproceedings{cherepanov2026iclr-recurrent,
title = {{Recurrent Action Transformer with Memory}},
author = {Cherepanov, Egor and Staroverov, Aleksei and Kovalev, Alexey and Panov, Aleksandr},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/cherepanov2026iclr-recurrent/}
}