Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials

Abstract

We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity $(dk/\epsilon)^{h(k)}$, where the function $h(k)$ scales super-polynomially in $k$. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.

Cite

Text

Diakonikolas and Kane. "Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials." Conference on Learning Theory, 2024.

Markdown

[Diakonikolas and Kane. "Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/diakonikolas2024colt-efficiently/)

BibTeX

@inproceedings{diakonikolas2024colt-efficiently,
  title     = {{Efficiently Learning One-Hidden-Layer ReLU Networks via SchurPolynomials}},
  author    = {Diakonikolas, Ilias and Kane, Daniel M.},
  booktitle = {Conference on Learning Theory},
  year      = {2024},
  pages     = {1364-1378},
  volume    = {247},
  url       = {https://mlanthology.org/colt/2024/diakonikolas2024colt-efficiently/}
}