An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

Abstract

To understand the dynamics of training in deep neural networks, we study the evolution of the Hessian eigenvalue density throughout the optimization process. In non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In a batch normalized network, these two effects are almost absent. We give a theoretical rationale to partially explain these phenomena. As part of this work, we adapt advanced tools from numerical linear algebra that allow scalable and accurate estimation of the entire Hessian spectrum of ImageNet-scale neural networks; this technique may be of independent interest in other applications.

Cite

Text

Ghorbani et al. "An Investigation into Neural Net Optimization via Hessian Eigenvalue Density." International Conference on Machine Learning, 2019.

Markdown

[Ghorbani et al. "An Investigation into Neural Net Optimization via Hessian Eigenvalue Density." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/ghorbani2019icml-investigation/)

BibTeX

@inproceedings{ghorbani2019icml-investigation,
  title     = {{An Investigation into Neural Net Optimization via Hessian Eigenvalue Density}},
  author    = {Ghorbani, Behrooz and Krishnan, Shankar and Xiao, Ying},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {2232-2241},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/ghorbani2019icml-investigation/}
}