Learning One-Hidden-Layer Neural Networks with Landscape Design

Abstract

We consider the problem of learning a one-hidden-layer neural network: we assume the input x is from Gaussian distribution and the label $y = a \sigma(Bx) + \xi$, where a is a nonnegative vector and $B$ is a full-rank weight matrix, and $\xi$ is a noise vector. We first give an analytic formula for the population risk of the standard squared loss and demonstrate that it implicitly attempts to decompose a sequence of low-rank tensors simultaneously. Inspired by the formula, we design a non-convex objective function $G$ whose landscape is guaranteed to have the following properties: 1. All local minima of $G$ are also global minima. 2. All global minima of $G$ correspond to the ground truth parameters. 3. The value and gradient of $G$ can be estimated using samples. With these properties, stochastic gradient descent on $G$ provably converges to the global minimum and learn the ground-truth parameters. We also prove finite sample complexity results and validate the results by simulations.

Cite

Text

Ge et al. "Learning One-Hidden-Layer Neural Networks with Landscape Design." International Conference on Learning Representations, 2018.

Markdown

[Ge et al. "Learning One-Hidden-Layer Neural Networks with Landscape Design." International Conference on Learning Representations, 2018.](https://mlanthology.org/iclr/2018/ge2018iclr-learning/)

BibTeX

@inproceedings{ge2018iclr-learning,
  title     = {{Learning One-Hidden-Layer Neural Networks with Landscape Design}},
  author    = {Ge, Rong and Lee, Jason D. and Ma, Tengyu},
  booktitle = {International Conference on Learning Representations},
  year      = {2018},
  url       = {https://mlanthology.org/iclr/2018/ge2018iclr-learning/}
}