Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks
Abstract
We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that in the underparameterized regime the network learns eigenfunctions of an integral operator $T_K$ determined by the Neural Tangent Kernel at rates corresponding to their eigenvalues. For example, for uniformly distributed data on the sphere $S^{d - 1}$ and rotation invariant weight distributions, the eigenfunctions of $T_K$ are the spherical harmonics. Our results can be understood as describing a spectral bias in the underparameterized regime. The proofs use the concept of ``Damped Deviations'' where deviations of the NTK matter less for eigendirections with large eigenvalues. Aside from the underparameterized regime, the damped deviations point-of-view allows us to extend certain results in the literature in the overparameterized setting.
Cite
Text
Bowman and Montufar. "Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks." International Conference on Learning Representations, 2022.Markdown
[Bowman and Montufar. "Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/bowman2022iclr-implicit/)BibTeX
@inproceedings{bowman2022iclr-implicit,
title = {{Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks}},
author = {Bowman, Benjamin and Montufar, Guido},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/bowman2022iclr-implicit/}
}