Optimal Bump Functions for Shallow ReLU Networks: Weight Decay, Depth Separation, Curse of Dimensionality

Abstract

In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose average parameters and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the average weight variable grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.

Cite

Text

Wojtowytsch. "Optimal Bump Functions for Shallow ReLU Networks: Weight Decay, Depth Separation, Curse of Dimensionality." Journal of Machine Learning Research, 2024.

Markdown

[Wojtowytsch. "Optimal Bump Functions for Shallow ReLU Networks: Weight Decay, Depth Separation, Curse of Dimensionality." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/wojtowytsch2024jmlr-optimal/)

BibTeX

@article{wojtowytsch2024jmlr-optimal,
  title     = {{Optimal Bump Functions for Shallow ReLU Networks: Weight Decay, Depth Separation, Curse of Dimensionality}},
  author    = {Wojtowytsch, Stephan},
  journal   = {Journal of Machine Learning Research},
  year      = {2024},
  pages     = {1-49},
  volume    = {25},
  url       = {https://mlanthology.org/jmlr/2024/wojtowytsch2024jmlr-optimal/}
}