Network Size and Size of the Weights in Memorization with Two-Layers Neural Networks
Abstract
In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of $n$ points in general position in $\R^d$ using only $\ulcorner n/d \urcorner$ neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error $\epsilon$, the neural tangent kernel can also memorize with only $O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right)$ neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the \emph{magnitude} of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both $O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right)$ neurons, as well as nearly-optimal size of the weights.
Cite
Text
Bubeck et al. "Network Size and Size of the Weights in Memorization with Two-Layers Neural Networks." Neural Information Processing Systems, 2020.Markdown
[Bubeck et al. "Network Size and Size of the Weights in Memorization with Two-Layers Neural Networks." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/bubeck2020neurips-network/)BibTeX
@inproceedings{bubeck2020neurips-network,
title = {{Network Size and Size of the Weights in Memorization with Two-Layers Neural Networks}},
author = {Bubeck, Sebastien and Eldan, Ronen and Lee, Yin Tat and Mikulincer, Dan},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/bubeck2020neurips-network/}
}