Non-Vacuous Generalisation Bounds for Shallow Neural Networks

Abstract

We focus on a specific class of shallow neural networks with a single hidden layer, namely those with $L_2$-normalised data and either a sigmoid-shaped Gaussian error function (“erf”) activation or a Gaussian Error Linear Unit (GELU) activation. For these networks, we derive new generalisation bounds through the PAC-Bayesian theory; unlike most existing such bounds they apply to neural networks with deterministic rather than randomised parameters. Our bounds are empirically non-vacuous when the network is trained with vanilla stochastic gradient descent on MNIST and Fashion-MNIST.

Cite

Text

Biggs and Guedj. "Non-Vacuous Generalisation Bounds for Shallow Neural Networks." International Conference on Machine Learning, 2022.

Markdown

[Biggs and Guedj. "Non-Vacuous Generalisation Bounds for Shallow Neural Networks." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/biggs2022icml-nonvacuous/)

BibTeX

@inproceedings{biggs2022icml-nonvacuous,
  title     = {{Non-Vacuous Generalisation Bounds for Shallow Neural Networks}},
  author    = {Biggs, Felix and Guedj, Benjamin},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {1963-1981},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/biggs2022icml-nonvacuous/}
}