Ensemble Learning for Multi-Layer Networks
Abstract
Bayesian treatments of learning in neural networks are typically based either on local Gaussian approximations to a mode of the posterior weight distribution, or on Markov chain Monte Carlo simulations. A third approach, called ensemble learning, was in(cid:173) troduced by Hinton and van Camp (1993). It aims to approximate the posterior distribution by minimizing the Kullback-Leibler di(cid:173) vergence between the true posterior and a parametric approximat(cid:173) ing distribution. However, the derivation of a deterministic algo(cid:173) rithm relied on the use of a Gaussian approximating distribution with a diagonal covariance matrix and so was unable to capture the posterior correlations between parameters. In this paper, we show how the ensemble learning approach can be extended to full(cid:173) covariance Gaussian distributions while remaining computationally tractable. We also extend the framework to deal with hyperparam(cid:173) eters, leading to a simple re-estimation procedure. Initial results from a standard benchmark problem are encouraging.
Cite
Text
Barber and Bishop. "Ensemble Learning for Multi-Layer Networks." Neural Information Processing Systems, 1997.Markdown
[Barber and Bishop. "Ensemble Learning for Multi-Layer Networks." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/barber1997neurips-ensemble/)BibTeX
@inproceedings{barber1997neurips-ensemble,
title = {{Ensemble Learning for Multi-Layer Networks}},
author = {Barber, David and Bishop, Christopher M.},
booktitle = {Neural Information Processing Systems},
year = {1997},
pages = {395-401},
url = {https://mlanthology.org/neurips/1997/barber1997neurips-ensemble/}
}