Optimal Bump Functions for Shallow ReLU Networks: Weight Decay, Depth Separation, Curse of Dimensionality

JMLR 2024 pp. 1-49

/jmlr/2024/wojtowytsch2024jmlr-optimal/

Abstract

In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose average parameters and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the average weight variable grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.

PDF JMLR Semantic Scholar

Cite

Text

Wojtowytsch. "Optimal Bump Functions for Shallow ReLU Networks: Weight Decay, Depth Separation, Curse of Dimensionality." Journal of Machine Learning Research, 2024.

Markdown

[Wojtowytsch. "Optimal Bump Functions for Shallow ReLU Networks: Weight Decay, Depth Separation, Curse of Dimensionality." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/wojtowytsch2024jmlr-optimal/)

BibTeX

@article{wojtowytsch2024jmlr-optimal,
  title     = {{Optimal Bump Functions for Shallow ReLU Networks: Weight Decay, Depth Separation, Curse of Dimensionality}},
  author    = {Wojtowytsch, Stephan},
  journal   = {Journal of Machine Learning Research},
  year      = {2024},
  pages     = {1-49},
  volume    = {25},
  url       = {https://mlanthology.org/jmlr/2024/wojtowytsch2024jmlr-optimal/}
}