Almost Bayesian: Dynamics of SGD Through Singular Learning Theory

Abstract

The nature of the relationship between Bayesian sampling and stochastic gradient descent in neural networks has been a long-standing open question in the theory of deep learning. We shed light on this question by modeling the long runtime behaviour of SGD as diffusion on porous media. Using singular learning theory, we show that the late stage dynamics are strongly impacted by the degeneracies of the loss surface. From this we are able to show that under reasonable choices of hyperparameters for vanilla SGD, the local steady state distribution of SGD (if it exists) is effectively a tempered version of the Bayesian posterior over the weights which accounts for local accessibility constraints.

Cite

Text

Hennick and De Baerdemacker. "Almost Bayesian: Dynamics of SGD Through Singular Learning Theory." International Conference on Learning Representations, 2026.

Markdown

[Hennick and De Baerdemacker. "Almost Bayesian: Dynamics of SGD Through Singular Learning Theory." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/hennick2026iclr-almost/)

BibTeX

@inproceedings{hennick2026iclr-almost,
  title     = {{Almost Bayesian: Dynamics of SGD Through Singular Learning Theory}},
  author    = {Hennick, Max and De Baerdemacker, Stijn},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/hennick2026iclr-almost/}
}