Bayesian Learning via Stochastic Gradient Langevin Dynamics
Abstract
In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an in-built protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a ``sampling threshold'' and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.
Cite
Text
Welling and Teh. "Bayesian Learning via Stochastic Gradient Langevin Dynamics." International Conference on Machine Learning, 2011.Markdown
[Welling and Teh. "Bayesian Learning via Stochastic Gradient Langevin Dynamics." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/welling2011icml-bayesian/)BibTeX
@inproceedings{welling2011icml-bayesian,
title = {{Bayesian Learning via Stochastic Gradient Langevin Dynamics}},
author = {Welling, Max and Teh, Yee Whye},
booktitle = {International Conference on Machine Learning},
year = {2011},
pages = {681-688},
url = {https://mlanthology.org/icml/2011/welling2011icml-bayesian/}
}