On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

Abstract

In this paper, we study the generalization performance of min $\ell_2$-norm overfitting solutions for the neural tangent kernel (NTK) model of a two-layer neural network with ReLU activation that has no bias term. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, we provide a new upper bound of the generalization error that approaches a small limiting value, even when the number of neurons $p$ approaches infinity. This limiting value further decreases with the number of training samples $n$. For functions outside of this class, we provide a lower bound on the generalization error that does not diminish to zero even when $n$ and $p$ are both large.

Cite

Text

Ju et al. "On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models." International Conference on Machine Learning, 2021.

Markdown

[Ju et al. "On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/ju2021icml-generalization/)

BibTeX

@inproceedings{ju2021icml-generalization,
  title     = {{On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models}},
  author    = {Ju, Peizhong and Lin, Xiaojun and Shroff, Ness},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {5137-5147},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/ju2021icml-generalization/}
}