Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient

Abstract

Model-based reinforcement learning (RL) offers a solution to the data inefficiency that plagues most model-free RL algorithms. However, learning a robust world model often requires complex and deep architectures, which are computationally expensive and challenging to train. Within the world model, sequence models play a critical role in accurate predictions, and various architectures have been explored, each with its own challenges. Currently, recurrent neural network (RNN)-based world models struggle with vanishing gradients and capturing long-term dependencies. Transformers, on the other hand, suffer from the quadratic memory and computational complexity of self-attention mechanisms, scaling as $O(n^2)$, where $n$ is the sequence length. To address these challenges, we propose a state space model (SSM)-based world model, Drama, specifically leveraging Mamba, that achieves $O(n)$ memory and computational complexity while effectively capturing long-term dependencies and enabling efficient training with longer sequences. We also introduce a novel sampling method to mitigate the suboptimality caused by an incorrect world model in the early training stages. Combining these techniques, Drama achieves a normalised score on the Atari100k benchmark that is competitive with other state-of-the-art (SOTA) model-based RL algorithms, using only a 7 million-parameter world model. Drama is accessible and trainable on off-the-shelf hardware, such as a standard laptop. Our code is available at https://github.com/realwenlongwang/Drama.git.

Cite

Text

Wang et al. "Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient." International Conference on Learning Representations, 2025.

Markdown

[Wang et al. "Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/wang2025iclr-drama/)

BibTeX

@inproceedings{wang2025iclr-drama,
  title     = {{Drama: Mamba-Enabled Model-Based Reinforcement Learning Is Sample and Parameter Efficient}},
  author    = {Wang, Wenlong and Dusparic, Ivana and Shi, Yucheng and Zhang, Ke and Cahill, Vinny},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/wang2025iclr-drama/}
}