Nonlinear Behaviour of Critical Points for a Simple Neural Network
Abstract
In severely over-parametrized regimes, neural network optimization can be analyzed by linearization techniques as the neural tangent kernel, which shows gradient descent convergence to zero training error, and landscape analysis, which shows that all local minima are global minima. Practical networks are often much less over-parametrized, and training behavior becomes more nuanced and nonlinear. This paper contains a fine grained analysis of the nonlinearity for a simple shallow network in one dimension. We show that the networks have unfavorable critical points, which can be mitigated by sufficiently high local resolution. Given this resolution, all critical points satisfy $L_2$ loss bounds of optimal adaptive approximation in Sobolev and Besov spaces on convex and concave subdomains of the target function. These bounds cannot be matched by linear approximation methods and show nonlinear and global behavior of the critical point's inner weights.
Cite
Text
Welper. "Nonlinear Behaviour of Critical Points for a Simple Neural Network." Transactions on Machine Learning Research, 2024.Markdown
[Welper. "Nonlinear Behaviour of Critical Points for a Simple Neural Network." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/welper2024tmlr-nonlinear/)BibTeX
@article{welper2024tmlr-nonlinear,
title = {{Nonlinear Behaviour of Critical Points for a Simple Neural Network}},
author = {Welper, Gerrit},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/welper2024tmlr-nonlinear/}
}