Geometry of Early Stopping in Linear Networks
Abstract
A theory of early stopping as applied to linear models is presented. The backpropagation learning algorithm is modeled as gradient descent in continuous time. Given a training set and a validation set, all weight vectors found by early stopping must lie on a cer(cid:173) tain quadric surface, usually an ellipsoid. Given a training set and a candidate early stopping weight vector, all validation sets have least-squares weights lying on a certain plane. This latter fact can be exploited to estimate the probability of stopping at any given point along the trajectory from the initial weight vector to the least(cid:173) squares weights derived from the training set, and to estimate the probability that training goes on indefinitely. The prospects for extending this theory to nonlinear models are discussed.
Cite
Text
Dodier. "Geometry of Early Stopping in Linear Networks." Neural Information Processing Systems, 1995.Markdown
[Dodier. "Geometry of Early Stopping in Linear Networks." Neural Information Processing Systems, 1995.](https://mlanthology.org/neurips/1995/dodier1995neurips-geometry/)BibTeX
@inproceedings{dodier1995neurips-geometry,
title = {{Geometry of Early Stopping in Linear Networks}},
author = {Dodier, Robert H.},
booktitle = {Neural Information Processing Systems},
year = {1995},
pages = {365-371},
url = {https://mlanthology.org/neurips/1995/dodier1995neurips-geometry/}
}