Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States

Abstract

Stochastic differential equations (SDEs) have been shown recently to well characterize the dynamics of training machine learning models with SGD. This provides two opportunities for better understanding the generalization behaviour of SGD through its SDE approximation. Firstly, viewing SGD as full-batch gradient descent with Gaussian gradient noise allows us to obtain trajectories-based generalization bound using the information-theoretic bound. Secondly, assuming mild conditions, we estimate the steady-state weight distribution of SDE and use the information-theoretic bound to establish terminal-state-based generalization bounds.

Cite

Text

Wang and Mao. "Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States." NeurIPS 2023 Workshops: M3L, 2023.

Markdown

[Wang and Mao. "Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States." NeurIPS 2023 Workshops: M3L, 2023.](https://mlanthology.org/neuripsw/2023/wang2023neuripsw-two/)

BibTeX

@inproceedings{wang2023neuripsw-two,
  title     = {{Two Facets of SDE Under an Information-Theoretic Lens: Generalization of SGD via Training Trajectories and via Terminal States}},
  author    = {Wang, Ziqiao and Mao, Yongyi},
  booktitle = {NeurIPS 2023 Workshops: M3L},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/wang2023neuripsw-two/}
}