Implicit Regularization of SGD Reduces Shortcut Learning

Mirzaie, Nahal; Alipanah, Alireza; Abbasi, Ali; Farzane, Amirmahdi; Jafarinia, Hossein; Sobhaei, Erfan; Ghaznavi, Mahdi; Najafi, Amir; Baghshah, Mahdieh Soleymani; Rohban, Mohammad Hossein

Implicit Regularization of SGD Reduces Shortcut Learning

Nahal Mirzaie, Alireza Alipanah, Ali Abbasi, Amirmahdi Farzane, Hossein Jafarinia, Erfan Sobhaei, Mahdi Ghaznavi, Amir Najafi, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban

ICLR 2026

/iclr/2026/mirzaie2026iclr-implicit/

Abstract

Training with stochastic gradient descent (SGD) at moderately large learning rates has been observed to improve robustness against spurious correlations, strong correlation between non-predictive features and target labels. Yet, the mechanism underlying this effect remains unclear. In this work, we identify batch size as an additional critical factor and show that robustness gains arise from the implicit regularization of SGD, which intensifies with larger learning rates and smaller batch sizes. This implicit regularization reduces reliance on spurious or shortcut features, thereby enhancing robustness while preserving accuracy. Importantly, this effect appears unique to SGD: gradient descent (GD) does not confer the same benefit and may even exacerbate shortcut reliance. Theoretically, we establish this phenomenon in linear models by leveraging statistical formulations of spurious correlations, proving that SGD systematically suppresses spurious feature dependence. Empirically, we demonstrate that the effect extends to deep neural networks across multiple benchmarks. Our code is available at \href{https://github.com/mirzanahal/sgd-implicit-regularization-shortcuts}https://github.com/mirzanahal/sgd-implicit-regularization-shortcuts.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Mirzaie et al. "Implicit Regularization of SGD Reduces Shortcut Learning." International Conference on Learning Representations, 2026.

Markdown

[Mirzaie et al. "Implicit Regularization of SGD Reduces Shortcut Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/mirzaie2026iclr-implicit/)

BibTeX

@inproceedings{mirzaie2026iclr-implicit,
  title     = {{Implicit Regularization of SGD Reduces Shortcut Learning}},
  author    = {Mirzaie, Nahal and Alipanah, Alireza and Abbasi, Ali and Farzane, Amirmahdi and Jafarinia, Hossein and Sobhaei, Erfan and Ghaznavi, Mahdi and Najafi, Amir and Baghshah, Mahdieh Soleymani and Rohban, Mohammad Hossein},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/mirzaie2026iclr-implicit/}
}