When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models

Sanyal, Sunny; Shwartz-Ziv, Ravid; Dimakis, Alex; Sanghavi, Sujay

When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models

Sunny Sanyal, Ravid Shwartz-Ziv, Alex Dimakis, Sujay Sanghavi

TMLR 2026

/tmlr/2026/sanyal2026tmlr-attention/

Abstract

Large Language Models (LLMs) are known for their performance, but we uncover a significant structural inefficiency: a phenomenon we term attention collapse. In many pre-trained decoder-style LLMs, the attention matrices in deeper layers degenerate, collapsing to near rank-one structures. These underutilized layers, which we call lazy layers, are redundant and impair model efficiency. To address this, we introduce Inheritune, a simple yet powerful training recipe designed to build smaller, stronger language models. Inheritune initializes a compact model by inheriting the potent early layers from a larger pre-trained model and then progressively trains and expands it. Our experiments on various models, including the GPT-2 family, demonstrate that models trained with Inheritune can match or even surpass the performance of their larger counterparts, despite having significantly fewer layers. This work presents a novel path toward model compression by design, enabling the creation of compact, yet highly performant language models.

PDF TMLR OpenReview Code Semantic Scholar

Cite

Text

Sanyal et al. "When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models." Transactions on Machine Learning Research, 2026.

Markdown

[Sanyal et al. "When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/sanyal2026tmlr-attention/)

BibTeX

@article{sanyal2026tmlr-attention,
  title     = {{When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models}},
  author    = {Sanyal, Sunny and Shwartz-Ziv, Ravid and Dimakis, Alex and Sanghavi, Sujay},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/sanyal2026tmlr-attention/}
}