When Do Neural Networks Outperform Kernel Methods?

Abstract

For a certain scaling of the initialization of stochastic gradient descent (SGD), wide neural networks (NN) have been shown to be well approximated by reproducing kernel Hilbert space (RKHS) methods. Recent empirical work showed that, for some classification tasks, RKHS methods can replace NNs without a large loss in performance. On the other hand, two-layers NNs are known to encode richer smoothness classes than RKHS and we know of special examples for which SGD-trained NN provably outperform RKHS. This is true even in the wide network limit, for a different scaling of the initialization.

Cite

Text

Ghorbani et al. "When Do Neural Networks Outperform Kernel Methods?." Neural Information Processing Systems, 2020.

Markdown

[Ghorbani et al. "When Do Neural Networks Outperform Kernel Methods?." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/ghorbani2020neurips-neural/)

BibTeX

@inproceedings{ghorbani2020neurips-neural,
  title     = {{When Do Neural Networks Outperform Kernel Methods?}},
  author    = {Ghorbani, Behrooz and Mei, Song and Misiakiewicz, Theodor and Montanari, Andrea},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/ghorbani2020neurips-neural/}
}