Network Size and Size of the Weights in Memorization with Two-Layers Neural Networks

Sebastien Bubeck, Ronen Eldan, Yin Tat Lee, Dan Mikulincer

NeurIPS 2020

/neurips/2020/bubeck2020neurips-network/

Abstract

In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of $n$ points in general position in $\R^d$ using only $\ulcorner n/d \urcorner$ neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error $\epsilon$, the neural tangent kernel can also memorize with only $O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right)$ neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the \emph{magnitude} of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.

PDF NeurIPS Semantic Scholar

Cite

Text

Bubeck et al. "Network Size and Size of the Weights in Memorization with Two-Layers Neural Networks." Neural Information Processing Systems, 2020.

Markdown

[Bubeck et al. "Network Size and Size of the Weights in Memorization with Two-Layers Neural Networks." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/bubeck2020neurips-network/)

BibTeX

@inproceedings{bubeck2020neurips-network,
  title     = {{Network Size and Size of the Weights in Memorization with Two-Layers Neural Networks}},
  author    = {Bubeck, Sebastien and Eldan, Ronen and Lee, Yin Tat and Mikulincer, Dan},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/bubeck2020neurips-network/}
}