From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks
Abstract
Training strategies for modern deep neural networks (NNs) tend to induce a heavy-tailed (HT) empirical spectral density (ESD) in the layer weights. While previous efforts have shown that the HT phenomenon correlates with good generalization in large NNs, a theoretical explanation of its occurrence is still lacking. Especially, understanding the conditions which lead to this phenomenon can shed light on the interplay between generalization and weight spectra. Our work aims to bridge this gap by presenting a simple, rich setting to model the emergence of HT ESD. In particular, we present a theory-informed setup for ‘crafting’ heavy tails in the ESD of two-layer NNs and present a systematic analysis of the HT ESD emergence without any gradient noise. This is the first work to analyze a noise-free setting, and we also incorporate optimizer (GD/Adam) dependent (large) learning rates into the HT ESD analysis. Our results highlight the role of learning rates on the Bulk+Spike and HT shape of the ESDs in the early phase of training, which can facilitate generalization in the two-layer NN. These observations shed light on the behavior of large-scale NNs, albeit in a much simpler setting.
Cite
Text
Kothapalli et al. "From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks." Transactions on Machine Learning Research, 2025.Markdown
[Kothapalli et al. "From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/kothapalli2025tmlr-spikes/)BibTeX
@article{kothapalli2025tmlr-spikes,
title = {{From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks}},
author = {Kothapalli, Vignesh and Pang, Tianyu and Deng, Shenyang and Liu, Zongmin and Yang, Yaoqing},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/kothapalli2025tmlr-spikes/}
}