Mamba-3: Improved Sequence Modeling Using State Space Principles
Abstract
Scaling inference-time compute has emerged as an important driver of LLM performance, making inference efficiency a central focus of model design alongside model quality. While current Transformer models deliver strong quality, their quadratic compute and linear memory requirements make inference expensive. This has spurred the development of sub-quadratic models with reduced compute and constant memory requirements. However, many recent linear models trade off model quality and capability for algorithmic efficiency, failing on tasks such as state tracking. Moreover, their theoretically linear inference remains hardware-inefficient in practice. Guided by an inference-first perspective, we introduce three core methodological improvements inspired by the state space model (SSM) viewpoint of linear models. We combine: (1) a more expressive recurrence derived from SSM discretization, (2) a complex-valued state update rule enabling richer state tracking, and (3) a multi-input, multi-output (MIMO) formulation that improves model performance without increasing decode latency. Together with architectural refinements, Mamba-3 achieves significant gains across retrieval, state-tracking, and downstream language modeling tasks. At the 1.5B scale, Mamba-3 improves average downstream accuracy by 0.6 percentage points compared to the next best model (Gated DeltaNet), with the MIMO variant further improving accuracy by an additional 1.2 points, for a total gain of 1.8 points. Across state-size experiments, Mamba-3 achieves comparable perplexity to Mamba-2 despite using half the state size. These results demonstrate that Mamba-3 advances the performance–efficiency frontier.
Cite
Text
Lahoti et al. "Mamba-3: Improved Sequence Modeling Using State Space Principles." International Conference on Learning Representations, 2026.Markdown
[Lahoti et al. "Mamba-3: Improved Sequence Modeling Using State Space Principles." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lahoti2026iclr-mamba3/)BibTeX
@inproceedings{lahoti2026iclr-mamba3,
title = {{Mamba-3: Improved Sequence Modeling Using State Space Principles}},
author = {Lahoti, Aakash and Li, Kevin and Chen, Berlin and Wang, Caitlin and Bick, Aviv and Kolter, J Zico and Dao, Tri and Gu, Albert},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/lahoti2026iclr-mamba3/}
}