Investigating the Overlooked Hessian Structure: From CNNs to LLMs
Abstract
It is well-known that the Hessian of deep loss landscape matters to optimization and generalization of deep learning. Previous studies reported a rough Hessian structure in deep learning, which consists of two components, a small number of large eigenvalues and a large number of nearly-zero eigenvalues. To the best of our knowledge, we are the first to report that a simple but overlooked power-law Hessian structure exists in well-trained deep neural networks, including Convolutional Neural Networks (CNNs) and Large Language Models (LLMs). Moreover, we provide a maximum-entropy theoretical interpretation for the power-law Hessian structure and theoretically demonstrate the existence of robust and low-dimensional subspace of deep neural networks. Our extensive experiments using the proposed power-law spectral method demonstrate that the power-law Hessian spectra critically relate to multiple important behaviors of deep learning, including optimization, generalization, and overparameterization. Notably, we discover that the power-law Hessian structure of a given LLM can effectively predict generalization during training, while conventional sharpness-based generalization measures that often works well on CNNs become nearly useless for as a generalization predictor of LLMs.
Cite
Text
Tang et al. "Investigating the Overlooked Hessian Structure: From CNNs to LLMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Tang et al. "Investigating the Overlooked Hessian Structure: From CNNs to LLMs." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/tang2025icml-investigating/)BibTeX
@inproceedings{tang2025icml-investigating,
title = {{Investigating the Overlooked Hessian Structure: From CNNs to LLMs}},
author = {Tang, Qian-Yuan and Gu, Yufei and Cai, Yunfeng and Sun, Mingming and Li, Ping and Xun, Zhou and Xie, Zeke},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {58805-58831},
volume = {267},
url = {https://mlanthology.org/icml/2025/tang2025icml-investigating/}
}