Unveiling M-Sharpness Through the Structure of Stochastic Gradient Noise

Abstract

Sharpness-aware minimization (SAM) has emerged as a highly effective technique to improve model generalization, but its underlying principles are not fully understood. We investigate m-sharpness, where SAM performance improves monotonically as the micro-batch size for computing perturbations decreases, a phenomenon critical for distributed training yet lacking rigorous explanation. We leverage an extended Stochastic Differential Equation (SDE) framework and analyze stochastic gradient noise (SGN) to characterize the dynamics of SAM variants, including n-SAM and m-SAM. Our analysis reveals that stochastic perturbations induce an implicit variance-based sharpness regularization whose strength increases as m decreases. Motivated by this insight, we propose Reweighted SAM (RW-SAM), which employs sharpness-weighted sampling to mimic the generalization benefits of m-SAM while remaining parallelizable. Comprehensive experiments validate our theory and method.

Cite

Text

Luo et al. "Unveiling M-Sharpness Through the Structure of Stochastic Gradient Noise." Advances in Neural Information Processing Systems, 2025.

Markdown

[Luo et al. "Unveiling M-Sharpness Through the Structure of Stochastic Gradient Noise." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/luo2025neurips-unveiling/)

BibTeX

@inproceedings{luo2025neurips-unveiling,
  title     = {{Unveiling M-Sharpness Through the Structure of Stochastic Gradient Noise}},
  author    = {Luo, Haocheng and Harandi, Mehrtash and Phung, Dinh and Le, Trung},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/luo2025neurips-unveiling/}
}