Escaping Mediocrity: How Two-Layer Networks Learn Hard Generalized Linear Models

Abstract

This study explores the sample complexity for two-layer neural networks to learn a generalized linear target function under Stochastic Gradient Descent (SGD), focusing on the challenging regime where many flat directions are present at initialization. It is well-established that in this scenario $n=O(d\log d)$ samples are typically needed. However, we provide precise results concerning the pre-factors in high-dimensional contexts and for varying widths. Notably, our findings suggest that overparameterization can only enhance convergence by a constant factor within this problem class. These insights are grounded in the reduction of SGD dynamics to a stochastic process in lower dimensions, where escaping mediocrity equates to calculating an exit time. Yet, we demonstrate that a deterministic approximation of this process adequately represents the escape time, implying that the role of stochasticity may be minimal in this scenario.

Cite

Text

Arnaboldi et al. "Escaping Mediocrity: How Two-Layer Networks Learn Hard Generalized Linear Models." NeurIPS 2023 Workshops: OPT, 2023.

Markdown

[Arnaboldi et al. "Escaping Mediocrity: How Two-Layer Networks Learn Hard Generalized Linear Models." NeurIPS 2023 Workshops: OPT, 2023.](https://mlanthology.org/neuripsw/2023/arnaboldi2023neuripsw-escaping/)

BibTeX

@inproceedings{arnaboldi2023neuripsw-escaping,
  title     = {{Escaping Mediocrity: How Two-Layer Networks Learn Hard Generalized Linear Models}},
  author    = {Arnaboldi, Luca and Krzakala, Florent and Loureiro, Bruno and Stephan, Ludovic},
  booktitle = {NeurIPS 2023 Workshops: OPT},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/arnaboldi2023neuripsw-escaping/}
}