The Merged-Staircase Property: A Necessary and Nearly Sufficient Condition for SGD Learning of Sparse Functions on Two-Layer Neural Networks

Abstract

It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest —non-linear but regular networks— no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly understood how neural networks routinely tackle high-dimensional datasets and adapt to latent low-dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension $d$. Our main results characterize a hierarchical property —the merged-staircase property— that is both \emph{necessary and nearly sufficient} for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new “dimension-free” dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions.

Cite

Text

Abbe et al. "The Merged-Staircase Property: A Necessary and Nearly Sufficient Condition for SGD Learning of Sparse Functions on Two-Layer Neural Networks." Conference on Learning Theory, 2022.

Markdown

[Abbe et al. "The Merged-Staircase Property: A Necessary and Nearly Sufficient Condition for SGD Learning of Sparse Functions on Two-Layer Neural Networks." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/abbe2022colt-mergedstaircase/)

BibTeX

@inproceedings{abbe2022colt-mergedstaircase,
  title     = {{The Merged-Staircase Property: A Necessary and Nearly Sufficient Condition for SGD Learning of Sparse Functions on Two-Layer Neural Networks}},
  author    = {Abbe, Emmanuel and Adsera, Enric Boix and Misiakiewicz, Theodor},
  booktitle = {Conference on Learning Theory},
  year      = {2022},
  pages     = {4782-4887},
  volume    = {178},
  url       = {https://mlanthology.org/colt/2022/abbe2022colt-mergedstaircase/}
}