Implicit Bias of the Step Size in Linear Diagonal Neural Networks

Abstract

Focusing on diagonal linear networks as a model for understanding the implicit bias in underdetermined models, we show how the gradient descent step size can have a large qualitative effect on the implicit bias, and thus on generalization ability. In particular, we show how using large step size for non-centered data can change the implicit bias from a "kernel" type behavior to a "rich" (sparsity-inducing) regime — even when gradient flow, studied in previous works, would not escape the "kernel" regime. We do so by using dynamic stability, proving that convergence to dynamically stable global minima entails a bound on some weighted $\ell_1$-norm of the linear predictor, i.e. a "rich" regime. We prove this leads to good generalization in a sparse regression setting.

Cite

Text

Nacson et al. "Implicit Bias of the Step Size in Linear Diagonal Neural Networks." International Conference on Machine Learning, 2022.

Markdown

[Nacson et al. "Implicit Bias of the Step Size in Linear Diagonal Neural Networks." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/nacson2022icml-implicit/)

BibTeX

@inproceedings{nacson2022icml-implicit,
  title     = {{Implicit Bias of the Step Size in Linear Diagonal Neural Networks}},
  author    = {Nacson, Mor Shpigel and Ravichandran, Kavya and Srebro, Nathan and Soudry, Daniel},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {16270-16295},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/nacson2022icml-implicit/}
}