Non-Gaussian Tensor Programs

Abstract

Does it matter whether one randomly initializes a neural network (NN) from Gaussian, uniform, or other distributions? We show the answer is ”yes” in some parameter tensors (the so-called matrix-like parameters) but ”no” in others when the NN is wide. This is a specific instance of a more general universality principle for Tensor Programs (TP) that informs precisely when the limit of a program depends on the distribution of its initial matrices and vectors. To obtain this principle, we develop the theory of non-Gaussian Tensor Programs. As corollaries, we obtain all previous consequences of the TP framework (such as NNGP/NTK correspondence, Free Independence Principle, Dynamical Dichotomy Theorem, and μ-parametrization) for NNs with non-Gaussian weights.

Cite

Text

Golikov and Yang. "Non-Gaussian Tensor Programs." Neural Information Processing Systems, 2022.

Markdown

[Golikov and Yang. "Non-Gaussian Tensor Programs." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/golikov2022neurips-nongaussian/)

BibTeX

@inproceedings{golikov2022neurips-nongaussian,
  title     = {{Non-Gaussian Tensor Programs}},
  author    = {Golikov, Eugene and Yang, Greg},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/golikov2022neurips-nongaussian/}
}