νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints

Abstract

Sharpness-aware minimization (SAM) has been shown to improve the generalization of neural networks. However, the method comes at the expense of storing a perturbation of the model parameters, which can be restrictive when memory bound. We design a variant of SAM, called $\nu$SAM, which obtains a low-rank perturbation by modifying the perturbation constraint. The update almost entirely removes the memory footprint of the perturbation without increasing the computational complexity, thus achieving close to a $1/3$ memory saving regarding the parameters when using SGD as the base optimizer. We demonstrate comparable performance of $\nu$SAM with SAM on vision transformers both when training models from scratch and for fine-tuning. Interestingly, $\nu$SAM seems to significantly improve performance for MLP-Mixer architectures across both settings. The results are corroborated theoretically, where we show that SAM with an \emph{arbitrary} norm choice (which includes $\nu$SAM) can converge even with fixed perturbation radius.

Cite

Text

Pethick et al. "νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints." Transactions on Machine Learning Research, 2025.

Markdown

[Pethick et al. "νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/pethick2025tmlr-sam/)

BibTeX

@article{pethick2025tmlr-sam,
  title     = {{νSAM: Memory-Efficient Sharpness-Aware Minimization via Nuclear Norm Constraints}},
  author    = {Pethick, Thomas and Raman, Parameswaran and Minorics, Lenon and Hong, Mingyi and Sabach, Shoham and Cevher, Volkan},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/pethick2025tmlr-sam/}
}