Early Stopping as Nonparametric Variational Inference

Abstract

We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.

Cite

Text

Duvenaud et al. "Early Stopping as Nonparametric Variational Inference." International Conference on Artificial Intelligence and Statistics, 2016.

Markdown

[Duvenaud et al. "Early Stopping as Nonparametric Variational Inference." International Conference on Artificial Intelligence and Statistics, 2016.](https://mlanthology.org/aistats/2016/duvenaud2016aistats-early/)

BibTeX

@inproceedings{duvenaud2016aistats-early,
  title     = {{Early Stopping as Nonparametric Variational Inference}},
  author    = {Duvenaud, David and Maclaurin, Dougal and Adams, Ryan P.},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2016},
  pages     = {1070-1077},
  url       = {https://mlanthology.org/aistats/2016/duvenaud2016aistats-early/}
}