Over-Parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning

Abstract

We consider gradient-based optimisation of wide, shallow neural networks with hidden-node ouputs scaled by positive scale parameters. The scale parameters are non-identical, differing from classical Neural Tangent Kernel (NTK) parameterisation. We prove that, for large networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime.

Cite

Text

Ayed et al. "Over-Parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\  Global Convergence Guarantees and Feature Learning." NeurIPS 2023 Workshops: M3L, 2023.

Markdown

[Ayed et al. "Over-Parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\  Global Convergence Guarantees and Feature Learning." NeurIPS 2023 Workshops: M3L, 2023.](https://mlanthology.org/neuripsw/2023/ayed2023neuripsw-overparameterised/)

BibTeX

@inproceedings{ayed2023neuripsw-overparameterised,
  title     = {{Over-Parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\  Global Convergence Guarantees and Feature Learning}},
  author    = {Ayed, Fadhel and Caron, Francois and Jung, Paul and Lee, Juho and Lee, Hoil and Yang, Hongseok},
  booktitle = {NeurIPS 2023 Workshops: M3L},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/ayed2023neuripsw-overparameterised/}
}