A Non-Parametric Regression Viewpoint : Generalization of Overparametrized Deep ReLU Network Under Noisy Observations
Abstract
We study the generalization properties of the overparameterized deep neural network (DNN) with Rectified Linear Unit (ReLU) activations. Under the non-parametric regression framework, it is assumed that the ground-truth function is from a reproducing kernel Hilbert space (RKHS) induced by a neural tangent kernel (NTK) of ReLU DNN, and a dataset is given with the noises. Without a delicate adoption of early stopping, we prove that the overparametrized DNN trained by vanilla gradient descent does not recover the ground-truth function. It turns out that the estimated DNN's $L_{2}$ prediction error is bounded away from $0$. As a complement of the above result, we show that the $\ell_{2}$-regularized gradient descent enables the overparametrized DNN achieve the minimax optimal convergence rate of the $L_{2}$ prediction error, without early stopping. Notably, the rate we obtained is faster than $\mathcal{O}(n^{-1/2})$ known in the literature.
Cite
Text
Suh et al. "A Non-Parametric Regression Viewpoint : Generalization of Overparametrized Deep ReLU Network Under Noisy Observations." International Conference on Learning Representations, 2022.Markdown
[Suh et al. "A Non-Parametric Regression Viewpoint : Generalization of Overparametrized Deep ReLU Network Under Noisy Observations." International Conference on Learning Representations, 2022.](https://mlanthology.org/iclr/2022/suh2022iclr-nonparametric/)BibTeX
@inproceedings{suh2022iclr-nonparametric,
title = {{A Non-Parametric Regression Viewpoint : Generalization of Overparametrized Deep ReLU Network Under Noisy Observations}},
author = {Suh, Namjoon and Ko, Hyunouk and Huo, Xiaoming},
booktitle = {International Conference on Learning Representations},
year = {2022},
url = {https://mlanthology.org/iclr/2022/suh2022iclr-nonparametric/}
}