Uncovering the Computational Roles of Nonlinearity in Sequence Modeling Using Almost-Linear RNNs
Abstract
Sequence modeling tasks across domains such as natural language processing, time-series forecasting, speech recognition, and control require learning complex mappings from input to output sequences. In recurrent networks, nonlinear recurrence is theoretically required to universally approximate such sequence-to-sequence functions; yet in practice, linear recurrent models have often proven surprisingly effective. This raises the question of when nonlinearity is truly required. In this study, we present a framework to systematically dissect the functional role of nonlinearity in recurrent networks -- allowing to identify both when it is computationally necessary, and what mechanisms it enables. We address the question using Almost Linear Recurrent Neural Networks (AL-RNNs), which allow the recurrence nonlinearity to be gradually attenuated and decompose network dynamics into analyzable linear regimes, making the underlying computational mechanisms explicit. We illustrate the framework across a diverse set of synthetic and real-world tasks, including classic sequence modeling benchmarks, an empirical neuroscientific stimulus-selection task, and a multi-task suite. We demonstrate how the AL-RNN's piecewise linear structure enables direct identification of computational primitives such as gating, rule-based integration, and memory-dependent transients, revealing that these operations emerge within predominantly linear dynamical backbones. Across tasks, sparse nonlinearity plays several functional roles: it improves interpretability by reducing and localizing nonlinear computations, promotes shared (rather than highly distributed) representations in multi-task settings, and reduces computational cost by limiting nonlinear operations. Moreover, sparse nonlinearity acts as a useful inductive bias: in low-data regimes, or when tasks require discrete switching between linear regimes, sparsely nonlinear models often match or exceed the performance of fully nonlinear architectures. Our findings provide a principled approach for identifying where nonlinearity is functionally necessary in sequence models, guiding the design of recurrent architectures that balance performance, efficiency, and mechanistic interpretability.
Cite
Text
Brenner and Koppe. "Uncovering the Computational Roles of Nonlinearity in Sequence Modeling Using Almost-Linear RNNs." Transactions on Machine Learning Research, 2026.Markdown
[Brenner and Koppe. "Uncovering the Computational Roles of Nonlinearity in Sequence Modeling Using Almost-Linear RNNs." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/brenner2026tmlr-uncovering/)BibTeX
@article{brenner2026tmlr-uncovering,
title = {{Uncovering the Computational Roles of Nonlinearity in Sequence Modeling Using Almost-Linear RNNs}},
author = {Brenner, Manuel and Koppe, Georgia},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/brenner2026tmlr-uncovering/}
}