On Feature Learning in Neural Networks with Global Convergence Guarantees

Abstract

We study the gradient flow optimization of over-parameterized neural networks (NNs) in a setup that allows feature learning while admitting non-asymptotic global convergence guarantees. First, we prove that for wide shallow NNs under the mean-field (MF) scaling and with a general class of activation functions, when the input dimension is at least the size of the training set, the training loss converges to zero at a linear rate under gradient flow. Building upon this analysis, we study a model of wide multi-layer NNs with random and untrained weights in earlier layers, and also prove a linear-rate convergence of the training loss to zero, regardless of the input dimension. We also show empirically that, unlike in the Neural Tangent Kernel (NTK) regime, our multi-layer model exhibits feature learning and can achieve better generalization performance than its NTK counterpart.

Cite

Text

Chen et al. "On Feature Learning in Neural Networks with Global Convergence Guarantees." International Conference on Learning Representations, 2022.

Markdown

[Chen et al. "On Feature Learning in Neural Networks with Global Convergence Guarantees." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/chen2022iclr-feature/)

BibTeX

@inproceedings{chen2022iclr-feature,
  title     = {{On Feature Learning in Neural Networks with Global Convergence Guarantees}},
  author    = {Chen, Zhengdao and Vanden-Eijnden, Eric and Bruna, Joan},
  booktitle = {International Conference on Learning Representations},
  year      = {2022},
  url       = {https://mlanthology.org/iclr/2022/chen2022iclr-feature/}
}