Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods

Abstract

In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian information as much as possible. By further exploiting either the low-rank structure or the Kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our proposed method is quite competitive to the state-of-the-art methods.

Cite

Text

Yang et al. "Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.01051

Markdown

[Yang et al. "Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/yang2021cvpr-enhance/) doi:10.1109/CVPR46437.2021.01051

BibTeX

@inproceedings{yang2021cvpr-enhance,
  title     = {{Enhance Curvature Information by Structured Stochastic Quasi-Newton Methods}},
  author    = {Yang, Minghan and Xu, Dong and Chen, Hongyu and Wen, Zaiwen and Chen, Mengyun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {10654-10663},
  doi       = {10.1109/CVPR46437.2021.01051},
  url       = {https://mlanthology.org/cvpr/2021/yang2021cvpr-enhance/}
}