Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Abstract

Previous work has shown that DNNs withlarge depth $L$ and $L_{2}$-regularization are biased towards learninglow-dimensional representations of the inputs, which can be interpretedas minimizing a notion of rank $R^{(0)}(f)$ of the learned function$f$, conjectured to be the Bottleneck rank. We compute finite depthcorrections to this result, revealing a measure $R^{(1)}$ of regularitywhich bounds the pseudo-determinant of the Jacobian $\left\|Jf(x)\right\|\_\+$and is subadditive under composition and addition. This formalizesa balance between learning low-dimensional representations and minimizingcomplexity/irregularity in the feature maps, allowing the networkto learn the `right' inner dimension. Finally, we prove the conjecturedbottleneck structure in the learned features as $L\to\infty$: forlarge depths, almost all hidden representations are approximately$R^{(0)}(f)$-dimensional, and almost all weight matrices $W_{\ell}$have $R^{(0)}(f)$ singular values close to 1 while the others are$O(L^{-\frac{1}{2}})$. Interestingly, the use of large learning ratesis required to guarantee an order $O(L)$ NTK which in turns guaranteesinfinite depth convergence of the representations of almost all layers.

Cite

Text

Jacot. "Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff." Neural Information Processing Systems, 2023.

Markdown

[Jacot. "Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/jacot2023neurips-bottleneck/)

BibTeX

@inproceedings{jacot2023neurips-bottleneck,
  title     = {{Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff}},
  author    = {Jacot, Arthur},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/jacot2023neurips-bottleneck/}
}