Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

Abstract

Modern neural networks are usually highly over-parameterized. Behind the wide usage of over-parameterized networks is the belief that, if the data are simple, then the trained network will be automatically equivalent to a simple predictor. Following this intuition, many existing works have studied different notions of "ranks" of neural networks and their relation to the rank of data. In this work, we study the rank of convolutional neural networks (CNNs) trained by gradient descent, with a specific focus on the robustness of the rank to noises in data. Specifically, we point out that, when adding noises to data inputs, the rank of the CNN trained with gradient descent is affected far less compared with the rank of the data, and even when a significant amount of noises have been added, the CNN filters can still effectively recover the intrinsic dimension of the clean data. We back up our claim with a theoretical case study, where we consider data points consisting of "signals" and "noises" and we rigorously prove that CNNs trained by gradient descent can learn the intrinsic dimension of the data signals.

Cite

Text

Zhang et al. "Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks." ICML 2024 Workshops: HiLD, 2024.

Markdown

[Zhang et al. "Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/zhang2024icmlw-gradient/)

BibTeX

@inproceedings{zhang2024icmlw-gradient,
  title     = {{Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks}},
  author    = {Zhang, Chenyang and Peifeng, Gao and Zou, Difan and Cao, Yuan},
  booktitle = {ICML 2024 Workshops: HiLD},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/zhang2024icmlw-gradient/}
}