Lambda-Skip Connections: The Architectural Component That Prevents Rank Collapse

Abstract

Rank collapse, a phenomenon where embedding vectors in sequence models rapidly converge to a uniform token or equilibrium state, has recently gained at- tention in the deep learning literature. This phenomenon leads to reduced expres- sivity and potential training instabilities due to vanishing gradients. Empirical ev- idence suggests that architectural components like skip connections, LayerNorm, and MultiLayer Perceptrons (MLPs) play critical roles in mitigating rank collapse. While this issue is well-documented for transformers, alternative sequence mod- els, such as State Space Models (SSMs), which have recently gained prominence, have not been thoroughly examined for similar vulnerabilities. This paper extends the theory of rank collapse from transformers to SSMs using a unifying frame- work that captures both architectures. We introduce a modification in the skip connection component, termed lambda-skip connections, that provides guaran- tees for rank collapse prevention. We present, via analytical results, a sufficient condition to achieve the guarantee for all of the aforementioned architectures. We also study the necessity of this condition via ablation studies and analytical exam- ples. To our knowledge, this is the first study that provides a general guarantee to prevent rank collapse, and that investigates rank collapse in the context of SSMs, offering valuable understanding for both theoreticians and practitioners. Finally, we validate our findings with experiments demonstrating the crucial role of archi- tectural components in preventing rank collapse.

Cite

Text

Joseph et al. "Lambda-Skip Connections: The Architectural Component That Prevents Rank Collapse." International Conference on Learning Representations, 2025.

Markdown

[Joseph et al. "Lambda-Skip Connections: The Architectural Component That Prevents Rank Collapse." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/joseph2025iclr-lambdaskip/)

BibTeX

@inproceedings{joseph2025iclr-lambdaskip,
  title     = {{Lambda-Skip Connections: The Architectural Component That Prevents Rank Collapse}},
  author    = {Joseph, Federico Arangath and Sieber, Jerome and Zeilinger, Melanie and Alonso, Carmen Amo},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/joseph2025iclr-lambdaskip/}
}