Investigating the Overlooked Hessian Structure: From CNNs to LLMs

Qian-Yuan Tang, Yufei Gu, Yunfeng Cai, Mingming Sun, Ping Li, Zhou Xun, Zeke Xie

ICML 2025 pp. 58805-58831

/icml/2025/tang2025icml-investigating/

Abstract

It is well-known that the Hessian of deep loss landscape matters to optimization and generalization of deep learning. Previous studies reported a rough Hessian structure in deep learning, which consists of two components, a small number of large eigenvalues and a large number of nearly-zero eigenvalues. To the best of our knowledge, we are the first to report that a simple but overlooked power-law Hessian structure exists in well-trained deep neural networks, including Convolutional Neural Networks (CNNs) and Large Language Models (LLMs). Moreover, we provide a maximum-entropy theoretical interpretation for the power-law Hessian structure and theoretically demonstrate the existence of robust and low-dimensional subspace of deep neural networks. Our extensive experiments using the proposed power-law spectral method demonstrate that the power-law Hessian spectra critically relate to multiple important behaviors of deep learning, including optimization, generalization, and overparameterization. Notably, we discover that the power-law Hessian structure of a given LLM can effectively predict generalization during training, while conventional sharpness-based generalization measures that often works well on CNNs become nearly useless for as a generalization predictor of LLMs.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Tang et al. "Investigating the Overlooked Hessian Structure: From CNNs to LLMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Tang et al. "Investigating the Overlooked Hessian Structure: From CNNs to LLMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/tang2025icml-investigating/)

BibTeX

@inproceedings{tang2025icml-investigating,
  title     = {{Investigating the Overlooked Hessian Structure: From CNNs to LLMs}},
  author    = {Tang, Qian-Yuan and Gu, Yufei and Cai, Yunfeng and Sun, Mingming and Li, Ping and Xun, Zhou and Xie, Zeke},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {58805-58831},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/tang2025icml-investigating/}
}