Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data

Abstract

The implicit biases of gradient-based optimization algorithms are conjectured to be a major factor in the success of modern deep learning. In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with leaky ReLU activations when the training data are nearly-orthogonal, a common property of high-dimensional data. For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that asymptotically, gradient flow produces a neural network with rank at most two. Moreover, this network is an $\ell_2$-max-margin solution (in parameter space), and has a linear decision boundary that corresponds to an approximate-max-margin linear predictor. For gradient descent, provided the random initialization variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training. We provide experiments which suggest that a small initialization scale is important for finding low-rank neural networks with gradient descent.

Cite

Text

Frei et al. "Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data." International Conference on Learning Representations, 2023.

Markdown

[Frei et al. "Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/frei2023iclr-implicit/)

BibTeX

@inproceedings{frei2023iclr-implicit,
  title     = {{Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data}},
  author    = {Frei, Spencer and Vardi, Gal and Bartlett, Peter and Srebro, Nathan and Hu, Wei},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/frei2023iclr-implicit/}
}