Charting Flat Minima Using the Conserved Quantities of Gradient Flow

Abstract

Empirical studies have revealed that many minima in the loss landscape of deep learning are connected and reside on a low-loss valley. We present a general framework for finding continuous symmetries in the parameter space, which give rise to the low-loss valleys. We introduce a novel set of nonlinear, data-dependent symmetries for neural networks. We then show that conserved quantities associated with linear symmetries can be used to define coordinates along the minima. The distribution of conserved quantities reveals that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability. We also find the nonlinear action to be viable for ensemble building to improve robustness under certain adversarial attacks.

Cite

Text

Zhao et al. "Charting Flat Minima Using the Conserved Quantities of Gradient Flow." NeurIPS 2022 Workshops: NeurReps, 2022.

Markdown

[Zhao et al. "Charting Flat Minima Using the Conserved Quantities of Gradient Flow." NeurIPS 2022 Workshops: NeurReps, 2022.](https://mlanthology.org/neuripsw/2022/zhao2022neuripsw-charting/)

BibTeX

@inproceedings{zhao2022neuripsw-charting,
  title     = {{Charting Flat Minima Using the Conserved Quantities of Gradient Flow}},
  author    = {Zhao, Bo and Ganev, Iordan and Walters, Robin and Yu, Rose and Dehmamy, Nima},
  booktitle = {NeurIPS 2022 Workshops: NeurReps},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/zhao2022neuripsw-charting/}
}