A New Characterization of the Edge of Stability Based on a Sharpness Measure Aware of Batch Gradient Distribution

Abstract

For full-batch gradient descent (GD), it has been empirically shown that the sharpness, the top eigenvalue of the Hessian, increases and then hovers above $2/\text{(learning rate)}$, and this is called ``the edge of stability'' phenomenon. However, it is unclear why the sharpness is somewhat larger than $2/\text{(learning rate)}$ and how this can be extended to general mini-batch stochastic gradient descent (SGD). We propose a new sharpness measure (interaction-aware-sharpness) aware of the \emph{interaction} between the batch gradient distribution and the loss landscape geometry. This leads to a more refined and general characterization of the edge of stability for SGD. Moreover, based on the analysis of a concentration measure of the batch gradient, we propose a more accurate scaling rule, Linear and Saturation Scaling Rule (LSSR), between batch size and learning rate.

Cite

Text

Lee and Jang. "A New Characterization of the Edge of Stability Based on a Sharpness Measure Aware of Batch Gradient Distribution." International Conference on Learning Representations, 2023.

Markdown

[Lee and Jang. "A New Characterization of the Edge of Stability Based on a Sharpness Measure Aware of Batch Gradient Distribution." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/lee2023iclr-new/)

BibTeX

@inproceedings{lee2023iclr-new,
  title     = {{A New Characterization of the Edge of Stability Based on a Sharpness Measure Aware of Batch Gradient Distribution}},
  author    = {Lee, Sungyoon and Jang, Cheongjae},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/lee2023iclr-new/}
}