Shape Matters: Understanding the Implicit Bias of the Noise Covariance

Abstract

The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization effect for training overparameterized models. Prior theoretical work largely focuses on spherical Gaussian noise, whereas empirical studies demonstrate the phenomenon that parameter-dependent noise — induced by mini-batches or label perturbation — is far more effective than Gaussian noise. This paper theoretically characterizes this phenomenon on a quadratically-parameterized model introduced by Vaskevicius et al. and Woodworth et al. We show that in an over-parameterized setting, SGD with label noise recovers the sparse ground-truth with an arbitrary initialization, whereas SGD with Gaussian noise or gradient descent overfits to dense solutions with large norms. Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.

Cite

Text

HaoChen et al. "Shape Matters: Understanding the Implicit Bias of the Noise Covariance." Conference on Learning Theory, 2021.

Markdown

[HaoChen et al. "Shape Matters: Understanding the Implicit Bias of the Noise Covariance." Conference on Learning Theory, 2021.](https://mlanthology.org/colt/2021/haochen2021colt-shape/)

BibTeX

@inproceedings{haochen2021colt-shape,
  title     = {{Shape Matters: Understanding the Implicit Bias of the Noise Covariance}},
  author    = {HaoChen, Jeff Z. and Wei, Colin and Lee, Jason and Ma, Tengyu},
  booktitle = {Conference on Learning Theory},
  year      = {2021},
  pages     = {2315-2357},
  volume    = {134},
  url       = {https://mlanthology.org/colt/2021/haochen2021colt-shape/}
}