Time/Accuracy Tradeoffs for Learning a ReLU with Respect to Gaussian Marginals

Abstract

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\opt < 1$ be the population loss of the best-fitting ReLU. We prove: \begin{itemize} \it{em} Finding a ReLU with square-loss $\opt + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply --{\em unconditionally}-- that gradient descent cannot converge to the global minimum in polynomial time. \it{em} There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error $O(\opt^{2/3})$. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss. \end{itemize} Prior work due to Soltanolkotabi \cite{soltanolkotabi2017learning} showed that gradient descent {\em can} find the best-fitting ReLU with respect to Gaussian marginals, if the training set is {\em exactly} labeled by a ReLU.

Cite

Text

Goel et al. "Time/Accuracy Tradeoffs for Learning a ReLU with Respect to Gaussian Marginals." Neural Information Processing Systems, 2019.

Markdown

[Goel et al. "Time/Accuracy Tradeoffs for Learning a ReLU with Respect to Gaussian Marginals." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/goel2019neurips-time/)

BibTeX

@inproceedings{goel2019neurips-time,
  title     = {{Time/Accuracy Tradeoffs for Learning a ReLU with Respect to Gaussian Marginals}},
  author    = {Goel, Surbhi and Karmalkar, Sushrut and Klivans, Adam},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {8584-8593},
  url       = {https://mlanthology.org/neurips/2019/goel2019neurips-time/}
}