Rate of Convergence of Polynomial Networks to Gaussian Processes
Abstract
We examine one-hidden-layer neural networks with random weights. It is well-known that in the limit of infinitely many neurons they simplify to Gaussian processes. For networks with a polynomial activation, we demonstrate that the rate of this convergence in 2-Wasserstein metric is O(1/sqrt(n)), where n is the number of hidden neurons. We suspect this rate is asymptotically sharp. We improve the known convergence rate for other activations, to power-law in n for ReLU and inverse-square-root up to logarithmic factors for erf. We explore the interplay between spherical harmonics, Stein kernels and optimal transport in the non-isotropic setting.
Cite
Text
Klukowski. "Rate of Convergence of Polynomial Networks to Gaussian Processes." Conference on Learning Theory, 2022.Markdown
[Klukowski. "Rate of Convergence of Polynomial Networks to Gaussian Processes." Conference on Learning Theory, 2022.](https://mlanthology.org/colt/2022/klukowski2022colt-rate/)BibTeX
@inproceedings{klukowski2022colt-rate,
title = {{Rate of Convergence of Polynomial Networks to Gaussian Processes}},
author = {Klukowski, Adam},
booktitle = {Conference on Learning Theory},
year = {2022},
pages = {701-722},
volume = {178},
url = {https://mlanthology.org/colt/2022/klukowski2022colt-rate/}
}