Optimal Stopping and Effective Machine Complexity in Learning

Abstract

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e<i and used to explain the trade-off betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll.

Cite

Text

Wang et al. "Optimal Stopping and Effective Machine Complexity in Learning." Neural Information Processing Systems, 1993.

Markdown

[Wang et al. "Optimal Stopping and Effective Machine Complexity in Learning." Neural Information Processing Systems, 1993.](https://mlanthology.org/neurips/1993/wang1993neurips-optimal/)

BibTeX

@inproceedings{wang1993neurips-optimal,
  title     = {{Optimal Stopping and Effective Machine Complexity in Learning}},
  author    = {Wang, Changfeng and Venkatesh, Santosh S. and Judd, J. Stephen},
  booktitle = {Neural Information Processing Systems},
  year      = {1993},
  pages     = {303-310},
  url       = {https://mlanthology.org/neurips/1993/wang1993neurips-optimal/}
}