Noise Stability of Transformer Models

Abstract

Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose *noise stability* as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to *all* input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical *noise stability regularization* method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately $35$\% and $75$\% respectively. Our results establish noise stability as a powerful tool for understanding and improving modern Transformers.

Cite

Text

Haris et al. "Noise Stability of Transformer Models." International Conference on Learning Representations, 2026.

Markdown

[Haris et al. "Noise Stability of Transformer Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/haris2026iclr-noise/)

BibTeX

@inproceedings{haris2026iclr-noise,
  title     = {{Noise Stability of Transformer Models}},
  author    = {Haris, Themistoklis and Zhang, Zihan and Yoshida, Yuichi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/haris2026iclr-noise/}
}