Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent

Abstract

Symmetries are abundant in the loss functions of neural networks, and understanding their impact on optimization algorithms is crucial for deep learning. We investigate the learning dynamics of Stochastic Gradient Descent (SGD) through the lens of exponential symmetries, a broad subclass of continuous symmetries in loss functions. Our analysis reveals that when gradient noise is imbalanced, SGD inherently drives model parameters toward a noise-balanced state, leading to the emergence of unique and attractive fixed points along degenerate directions. We prove that every parameter $\theta$ connects without barriers to a unique noise-balanced fixed point $\theta^*$. This finding offers a unified perspective on how symmetry and gradient noise influence SGD. The theory provides novel insights into deep learning phenomena such as progressive sharpening/flattening and warmup, demonstrating that noise balancing is a key mechanism underlying these effects.

Cite

Text

Ziyin et al. "Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent." NeurIPS 2024 Workshops: M3L, 2024.

Markdown

[Ziyin et al. "Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent." NeurIPS 2024 Workshops: M3L, 2024.](https://mlanthology.org/neuripsw/2024/ziyin2024neuripsw-parameter/)

BibTeX

@inproceedings{ziyin2024neuripsw-parameter,
  title     = {{Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent}},
  author    = {Ziyin, Liu and Wang, Mingze and Li, Hongchao and Wu, Lei},
  booktitle = {NeurIPS 2024 Workshops: M3L},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/ziyin2024neuripsw-parameter/}
}