Rate of Convergence of Polynomial Networks to Gaussian Processes

Abstract

We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is O(1/sqrt(n)), where n is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in n for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.

Cite

Text

Klukowski. "Rate of Convergence of Polynomial Networks to Gaussian Processes." Conference on Learning Theory, 2022.

Markdown

[Klukowski. "Rate of Convergence of Polynomial Networks to Gaussian Processes." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/klukowski2022colt-rate/)

BibTeX

@inproceedings{klukowski2022colt-rate,
  title     = {{Rate of Convergence of Polynomial Networks to Gaussian Processes}},
  author    = {Klukowski, Adam},
  booktitle = {Conference on Learning Theory},
  year      = {2022},
  pages     = {701-722},
  volume    = {178},
  url       = {https://mlanthology.org/colt/2022/klukowski2022colt-rate/}
}