Optimal Brain Damage

Abstract

We have used information-theoretic ideas to derive a class of prac(cid:173) tical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, sev(cid:173) eral improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative informa(cid:173) tion to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application.

Cite

Text

LeCun et al. "Optimal Brain Damage." Neural Information Processing Systems, 1989.

Markdown

[LeCun et al. "Optimal Brain Damage." Neural Information Processing Systems, 1989.](https://mlanthology.org/neurips/1989/lecun1989neurips-optimal/)

BibTeX

@inproceedings{lecun1989neurips-optimal,
  title     = {{Optimal Brain Damage}},
  author    = {LeCun, Yann and Denker, John S. and Solla, Sara A.},
  booktitle = {Neural Information Processing Systems},
  year      = {1989},
  pages     = {598-605},
  url       = {https://mlanthology.org/neurips/1989/lecun1989neurips-optimal/}
}