Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks

Elre Talea Oldewage, Ross M Clarke, José Miguel Hernández-Lobato

TMLR 2024

/tmlr/2024/oldewage2024tmlr-series/

Abstract

Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns – to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact inverse Hessian with absolute-value eigenvalues. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.

PDF TMLR Code Semantic Scholar

Cite

Text

Oldewage et al. "Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks." Transactions on Machine Learning Research, 2024.

Markdown

[Oldewage et al. "Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/oldewage2024tmlr-series/)

BibTeX

@article{oldewage2024tmlr-series,
  title     = {{Series of Hessian-Vector Products for Tractable Saddle-Free Newton Optimisation of Neural Networks}},
  author    = {Oldewage, Elre Talea and Clarke, Ross M and Hernández-Lobato, José Miguel},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/oldewage2024tmlr-series/}
}