Law of Large Numbers for Bayesian Two-Layer Neural Network Trained with Variational Inference

Abstract

We provide a rigorous analysis of training by variational inference (VI) of Bayesian neural networks in the two-layer and infinite-width case. We consider a regression problem with a regularized evidence lower bound (ELBO) which is decomposed into the expected log-likelihood of the data and the Kullback-Leibler (KL) divergence between the a priori distribution and the variational posterior. With an appropriate weighting of the KL, we prove a law of large numbers for three different training schemes: (i) the idealized case with exact estimation of a multiple Gaussian integral from the reparametrization trick, (ii) a minibatch scheme using Monte Carlo sampling, commonly known as Bayes by Backprop, and (iii) a new and computationally cheaper algorithm which we introduce as Minimal VI. An important result is that all methods converge to the same mean-field limit. Finally, we illustrate our results numerically and discuss the need for the derivation of a central limit theorem.

Cite

Text

Descours et al. "Law of Large Numbers for Bayesian Two-Layer Neural Network Trained with Variational Inference." Conference on Learning Theory, 2023.

Markdown

[Descours et al. "Law of Large Numbers for Bayesian Two-Layer Neural Network Trained with Variational Inference." Conference on Learning Theory, 2023.](https://mlanthology.org/colt/2023/descours2023colt-law/)

BibTeX

@inproceedings{descours2023colt-law,
  title     = {{Law of Large Numbers for Bayesian Two-Layer Neural Network Trained with Variational Inference}},
  author    = {Descours, Arnaud and Huix, Tom and Guillin, Arnaud and Michel, Manon and Moulines, Éric and Nectoux, Boris},
  booktitle = {Conference on Learning Theory},
  year      = {2023},
  pages     = {4657-4695},
  volume    = {195},
  url       = {https://mlanthology.org/colt/2023/descours2023colt-law/}
}