Deep Nets Don't Learn via Memorization

Abstract

We use empirical methods to argue that deep neural networks (DNNs) do not achieve their performance by \textit{memorizing} training data, in spite of overly-expressive model architectures. Instead, they learn a simple available hypothesis that fits the finite data samples. In support of this view, we establish that there are qualitative differences when learning noise vs.~natural datasets, showing that: (1) more capacity is needed to fit noise, (2) time to convergence is longer for random labels, but \emph{shorter} for random inputs, and (3) DNNs trained on real data examples learn simpler functions than when trained with noise data, as measured by the sharpness of the loss function at convergence. Finally, we demonstrate that for appropriately tuned explicit regularization, e.g.~dropout, we can degrade DNN training performance on noise datasets without compromising generalization on real data.

Cite

Text

Krueger et al. "Deep Nets Don't Learn via Memorization." International Conference on Learning Representations, 2017.

Markdown

[Krueger et al. "Deep Nets Don't Learn via Memorization." International Conference on Learning Representations, 2017.](https://mlanthology.org/iclr/2017/krueger2017iclr-deep/)

BibTeX

@inproceedings{krueger2017iclr-deep,
  title     = {{Deep Nets Don't Learn via Memorization}},
  author    = {Krueger, David and Ballas, Nicolas and Jastrzebski, Stanislaw and Arpit, Devansh and Kanwal, Maxinder S. and Maharaj, Tegan and Bengio, Emmanuel and Fischer, Asja and Courville, Aaron C.},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
  url       = {https://mlanthology.org/iclr/2017/krueger2017iclr-deep/}
}