Generalization Dynamics in LMS Trained Linear Networks

Abstract

For a simple linear case, a mathematical analysis of the training and gener(cid:173) alization (validation) performance of networks trained by gradient descent on a Least Mean Square cost function is provided as a function of the learn(cid:173) ing parameters and of the statistics of the training data base. The analysis predicts that generalization error dynamics are very dependent on a pri(cid:173) ori initial weights. In particular, the generalization error might sometimes weave within a computable range during extended training. In some cases, the analysis provides bounds on the optimal number of training cycles for minimal validation error. For a speech labeling task, predicted weaving effects were qualitatively tested and observed by computer simulations in networks trained by the linear and non-linear back-propagation algorithm.

Cite

Text

Chauvin. "Generalization Dynamics in LMS Trained Linear Networks." Neural Information Processing Systems, 1990.

Markdown

[Chauvin. "Generalization Dynamics in LMS Trained Linear Networks." Neural Information Processing Systems, 1990.](https://mlanthology.org/neurips/1990/chauvin1990neurips-generalization/)

BibTeX

@inproceedings{chauvin1990neurips-generalization,
  title     = {{Generalization Dynamics in LMS Trained Linear Networks}},
  author    = {Chauvin, Yves},
  booktitle = {Neural Information Processing Systems},
  year      = {1990},
  pages     = {890-896},
  url       = {https://mlanthology.org/neurips/1990/chauvin1990neurips-generalization/}
}