A Faster and Simpler Algorithm for Learning Shallow Networks
Abstract
We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. recently gave the first algorithm for this problem to run in $\mathrm{poly}(d,1/\epsilon)$ time when $k = O(1)$, where $\epsilon$ is the target error. More precisely, their algorithm runs in time $(d/\epsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d k/\epsilon)^{O(k^2)}$.
Cite
Text
Chen and Narayanan. "A Faster and Simpler Algorithm for Learning Shallow Networks." Conference on Learning Theory, 2024.Markdown
[Chen and Narayanan. "A Faster and Simpler Algorithm for Learning Shallow Networks." Conference on Learning Theory, 2024.](https://mlanthology.org/colt/2024/chen2024colt-faster/)BibTeX
@inproceedings{chen2024colt-faster,
title = {{A Faster and Simpler Algorithm for Learning Shallow Networks}},
author = {Chen, Sitan and Narayanan, Shyam},
booktitle = {Conference on Learning Theory},
year = {2024},
pages = {981-994},
volume = {247},
url = {https://mlanthology.org/colt/2024/chen2024colt-faster/}
}