Linear Transformers Implicitly Discover Unified Numerical Algorithms
Abstract
A transformer is merely a stack of learned data–to–data maps—yet those maps can hide rich algorithms. We train a linear, attention-only transformer on millions of masked-block completion tasks: each prompt is a masked low-rank matrix whose missing block may be (i) a scalar prediction target or (ii) an unseen kernel slice for Nyström extrapolation. The model sees only input–output pairs and a mean-squared loss; it is given no normal equations, no handcrafted iterations, and no hint that the tasks are related. Surprisingly, after training, algebraic unrolling reveals the same parameter-free update rule across all three resource regimes (full visibility, bandwidth-limited heads, rank-limited attention). We prove that this rule achieves second-order convergence on full-batch problems, cuts distributed iteration complexity, and remains accurate with compute-limited attention. Thus, a transformer trained solely to patch missing blocks implicitly discovers a unified, resource-adaptive iterative solver spanning prediction, estimation, and Nyström extrapolation—highlighting a powerful capability of in-context learning.
Cite
Text
Lutz et al. "Linear Transformers Implicitly Discover Unified Numerical Algorithms." Advances in Neural Information Processing Systems, 2025.Markdown
[Lutz et al. "Linear Transformers Implicitly Discover Unified Numerical Algorithms." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/lutz2025neurips-linear/)BibTeX
@inproceedings{lutz2025neurips-linear,
title = {{Linear Transformers Implicitly Discover Unified Numerical Algorithms}},
author = {Lutz, Patrick and Gangrade, Aditya and Daneshmand, Hadi and Saligrama, Venkatesh},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/lutz2025neurips-linear/}
}