Neural Networks Learning and Memorization with (almost) No Over-Parameterization

Abstract

Many results in recent years established polynomial time learnability of various models via neural networks algorithms (e.g. \cite{andoni2014learning, daniely2016toward, daniely2017sgd, cao2019generalization, ziwei2019polylogarithmic, zou2019improved, ma2019comparative, du2018gradient, arora2019fine, song2019quadratic, oymak2019towards, ge2019mildly, brutzkus2018sgd}). However, unless the model is linear separable~\cite{brutzkus2018sgd}, or the activation is a polynomial~\cite{ge2019mildly}, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with {\em near optimal} network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with $\tilde{O}\left(\frac{m}{d}\right)$ hidden neurons (and hence $\tilde{O}(m)$ parameters) can memorize $m$ random labeled points in $\sphere^{d-1}$.

PDF NeurIPS Semantic Scholar

Cite

Text

Daniely. "Neural Networks Learning and Memorization with (almost) No Over-Parameterization." Neural Information Processing Systems, 2020.

Markdown

[Daniely. "Neural Networks Learning and Memorization with (almost) No Over-Parameterization." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/daniely2020neurips-neural/)

BibTeX

@inproceedings{daniely2020neurips-neural,
  title     = {{Neural Networks Learning and Memorization with (almost) No Over-Parameterization}},
  author    = {Daniely, Amit},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/daniely2020neurips-neural/}
}