The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

Abstract

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative---the derivative of the Hessian in the leading eigenvector direction---that encourages drift toward wider minima.

Cite

Text

Bartlett et al. "The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima." Journal of Machine Learning Research, 2023.

Markdown

[Bartlett et al. "The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima." Journal of Machine Learning Research, 2023.](https://mlanthology.org/jmlr/2023/bartlett2023jmlr-dynamics/)

BibTeX

@article{bartlett2023jmlr-dynamics,
  title     = {{The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima}},
  author    = {Bartlett, Peter L. and Long, Philip M. and Bousquet, Olivier},
  journal   = {Journal of Machine Learning Research},
  year      = {2023},
  pages     = {1-36},
  volume    = {24},
  url       = {https://mlanthology.org/jmlr/2023/bartlett2023jmlr-dynamics/}
}