A Statistical Approach to Learning and Generalization in Layered Neural Networks
Abstract
This paper presents a general statistical description of the problem of learning from examples. Our focus is learning in layered networks, which is posed as an optimization problem: a search in the network parameter space for a network that minimizes an additive error function of the statistically independent examples. By imposing the equivalence of the minimum error and maximum likelihood criteria for training the network, we arrive at the Gibbs distribution on the ensemble of networks with fixed architecture. The probability of correct prediction of a novel example is used as a measure of the generalization ability of the trained network. The entropy of the prediction distribution is shown to be the relevant measure of network's performance, and is derived directly from the statistical properties of the ensemble. This can be viewed as a link between the information-theoretic model-order-estimation techniques, particularly the predictive Minimum Description Length, and the statistical mechanical properties of neural networks. As important theoretical applications of the proposed formalism we discuss optimal training strategies and asymptotic learning curves, i.e the generalization ability as a function of the number of examples.
Cite
Text
Levin et al. "A Statistical Approach to Learning and Generalization in Layered Neural Networks." Annual Conference on Computational Learning Theory, 1989. doi:10.1016/B978-0-08-094829-4.50020-9Markdown
[Levin et al. "A Statistical Approach to Learning and Generalization in Layered Neural Networks." Annual Conference on Computational Learning Theory, 1989.](https://mlanthology.org/colt/1989/levin1989colt-statistical/) doi:10.1016/B978-0-08-094829-4.50020-9BibTeX
@inproceedings{levin1989colt-statistical,
title = {{A Statistical Approach to Learning and Generalization in Layered Neural Networks}},
author = {Levin, Esther and Tishby, Naftali and Solla, Sara A.},
booktitle = {Annual Conference on Computational Learning Theory},
year = {1989},
pages = {245-260},
doi = {10.1016/B978-0-08-094829-4.50020-9},
url = {https://mlanthology.org/colt/1989/levin1989colt-statistical/}
}