A Simple Weight Decay Can Improve Generalization

Abstract

It has been observed in numerical simulations that a weight decay can im(cid:173) prove generalization in a feed-forward neural network. This paper explains why. It is proven that a weight decay has two effects in a linear network. First, it suppresses any irrelevant components of the weight vector by choosing the smallest vector that solves the learning problem. Second, if the size is chosen right, a weight decay can suppress some of the effects of static noise on the targets, which improves generalization quite a lot. It is then shown how to extend these results to networks with hidden layers and non-linear units. Finally the theory is confirmed by some numerical simulations using the data from NetTalk.

Cite

Text

Krogh and Hertz. "A Simple Weight Decay Can Improve Generalization." Neural Information Processing Systems, 1991.

Markdown

[Krogh and Hertz. "A Simple Weight Decay Can Improve Generalization." Neural Information Processing Systems, 1991.](https://mlanthology.org/neurips/1991/krogh1991neurips-simple/)

BibTeX

@inproceedings{krogh1991neurips-simple,
  title     = {{A Simple Weight Decay Can Improve Generalization}},
  author    = {Krogh, Anders and Hertz, John A.},
  booktitle = {Neural Information Processing Systems},
  year      = {1991},
  pages     = {950-957},
  url       = {https://mlanthology.org/neurips/1991/krogh1991neurips-simple/}
}