Rank Minimization, Alignment and Weight Decay in Neural Networks

Abstract

We empirically study the evolution of the singular values and vectors of neural network weights across a wide variety of practical architectures and domains, including CNNs for image classification, LSTMs for speech recognition, and Transformers for language modeling. Across these settings, we observe that (i) large singular values grow much faster, decreasing the effective ranks of weight matrices, (ii) this growth occurs despite weak alignment between neighboring layers' singular vectors, a common assumption in prior theoretical work, and (iii) weight decay promotes both rank minimization, and neighboring layer alignment. Since these architectures are far from idealized linear neural networks, our observations extend the predictions of existing theory to more practical settings.

Cite

Text

Yunis et al. "Rank Minimization, Alignment and Weight Decay in Neural Networks." ICML 2024 Workshops: HiLD, 2024.

Markdown

[Yunis et al. "Rank Minimization, Alignment and Weight Decay in Neural Networks." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/yunis2024icmlw-rank/)

BibTeX

@inproceedings{yunis2024icmlw-rank,
  title     = {{Rank Minimization, Alignment and Weight Decay in Neural Networks}},
  author    = {Yunis, David and Patel, Kumar Kshitij and Wheeler, Samuel and Savarese, Pedro Henrique Pamplona and Vardi, Gal and Livescu, Karen and Maire, Michael and Walter, Matthew},
  booktitle = {ICML 2024 Workshops: HiLD},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/yunis2024icmlw-rank/}
}