Selecting Weighting Factors in Logarithmic Opinion Pools
Abstract
A simple linear averaging of the outputs of several networks as e.g. in bagging [3], seems to follow naturally from a bias/variance decomposition of the sum-squared error. The sum-squared error of the average model is a quadratic function of the weighting factors assigned to the networks in the ensemble [7], suggesting a quadratic programming algorithm for finding the "optimal" weighting factors. If we interpret the output of a network as a probability statement, the sum-squared error corresponds to minus the loglikelihood or the Kullback-Leibler divergence, and linear averaging of the out(cid:173) puts to logarithmic averaging of the probability statements: the logarithmic opinion pool. The crux of this paper is that this whole story about model aver(cid:173) aging, bias/variance decompositions, and quadratic programming to find the optimal weighting factors, is not specific for the sum(cid:173) squared error, but applies to the combination of probability state(cid:173) ments of any kind in a logarithmic opinion pool, as long as the Kullback-Leibler divergence plays the role of the error measure. As examples we treat model averaging for classification models under a cross-entropy error measure and models for estimating variances.
Cite
Text
Heskes. "Selecting Weighting Factors in Logarithmic Opinion Pools." Neural Information Processing Systems, 1997.Markdown
[Heskes. "Selecting Weighting Factors in Logarithmic Opinion Pools." Neural Information Processing Systems, 1997.](https://mlanthology.org/neurips/1997/heskes1997neurips-selecting/)BibTeX
@inproceedings{heskes1997neurips-selecting,
title = {{Selecting Weighting Factors in Logarithmic Opinion Pools}},
author = {Heskes, Tom},
booktitle = {Neural Information Processing Systems},
year = {1997},
pages = {266-272},
url = {https://mlanthology.org/neurips/1997/heskes1997neurips-selecting/}
}