Feature Learning in $l_2$-Regularized DNNs: Attraction/Repulsion and Sparsity

Abstract

We study the loss surface of DNNs with $L_{2}$ regularization. Weshow that the loss in terms of the parameters can be reformulatedinto a loss in terms of the layerwise activations $Z_{\ell}$ of thetraining set. This reformulation reveals the dynamics behind featurelearning: each hidden representations $Z_{\ell}$ are optimal w.r.t.to an attraction/repulsion problem and interpolate between the inputand output representations, keeping as little information from theinput as necessary to construct the activation of the next layer.For positively homogeneous non-linearities, the loss can be furtherreformulated in terms of the covariances of the hidden representations,which takes the form of a partially convex optimization over a convexcone.This second reformulation allows us to prove a sparsity result forhomogeneous DNNs: any local minimum of the $L_{2}$-regularized losscan be achieved with at most $N(N+1)$ neurons in each hidden layer(where $N$ is the size of the training set). We show that this boundis tight by giving an example of a local minimum that requires $N^{2}/4$hidden neurons. But we also observe numerically that in more traditionalsettings much less than $N^{2}$ neurons are required to reach theminima.

Cite

Text

Jacot et al. "Feature Learning in $l_2$-Regularized DNNs: Attraction/Repulsion and Sparsity." Neural Information Processing Systems, 2022.

Markdown

[Jacot et al. "Feature Learning in $l_2$-Regularized DNNs: Attraction/Repulsion and Sparsity." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/jacot2022neurips-feature/)

BibTeX

@inproceedings{jacot2022neurips-feature,
  title     = {{Feature Learning in $l_2$-Regularized DNNs: Attraction/Repulsion and Sparsity}},
  author    = {Jacot, Arthur and Golikov, Eugene and Hongler, Clement and Gabriel, Franck},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/jacot2022neurips-feature/}
}