Over-Parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning
Abstract
We consider gradient-based optimisation of wide, shallow neural networks with hidden-node ouputs scaled by positive scale parameters. The scale parameters are non-identical, differing from classical Neural Tangent Kernel (NTK) parameterisation. We prove that, for large networks, with high probability, gradient flow converges to a global minimum AND can learn features, unlike in the NTK regime.
Cite
Text
Ayed et al. "Over-Parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning." NeurIPS 2023 Workshops: M3L, 2023.Markdown
[Ayed et al. "Over-Parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning." NeurIPS 2023 Workshops: M3L, 2023.](https://mlanthology.org/neuripsw/2023/ayed2023neuripsw-overparameterised/)BibTeX
@inproceedings{ayed2023neuripsw-overparameterised,
title = {{Over-Parameterised Shallow Neural Networks with Asymmetrical Node Scaling: \\ Global Convergence Guarantees and Feature Learning}},
author = {Ayed, Fadhel and Caron, Francois and Jung, Paul and Lee, Juho and Lee, Hoil and Yang, Hongseok},
booktitle = {NeurIPS 2023 Workshops: M3L},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/ayed2023neuripsw-overparameterised/}
}