Large-Width Asymptotics and Training Dynamics of $\alpha$-Stable ReLU Neural Networks

Abstract

Large-width asymptotic properties of neural networks (NNs) with Gaussian distributed weights have been extensively investigated in the literature, with major results characterizing their large-width asymptotic behavior in terms of Gaussian processes and their large-width training dynamics in terms of the neural tangent kernel (NTK). In this paper, we study large-width asymptotics and training dynamics of $\alpha$-Stable ReLU-NNs, namely NNs with ReLU activation function and $\alpha$-Stable distributed weights, with $\alpha\in(0,2)$. For $\alpha\in(0,2]$, $\alpha$-Stable distributions form a broad class of heavy tails distributions, with the special case $\alpha=2$ corresponding to the Gaussian distribution. Firstly, we show that if the NN's width goes to infinity, then a rescaled $\alpha$-Stable ReLU-NN converges weakly (in distribution) to an $\alpha$-Stable process, which generalizes the Gaussian process. As a difference with respect to the Gaussian setting, our result shows that the activation function affects the scaling of the $\alpha$-Stable NN; more precisely, in order to achieve the infinite-width $\alpha$-Stable process, the ReLU activation requires an additional logarithmic term in the scaling with respect to sub-linear activations. Secondly, we characterize the large-width training dynamics of $\alpha$-Stable ReLU-NNs in terms an infinite-width random kernel, which is referred to as the $\alpha$-Stable NTK, and we show that the gradient descent achieves zero training error at linear rate, for a sufficiently large width, with high probability. Differently from the NTK arising in the Gaussian setting, the $\alpha$-Stable NTK is a random kernel; more precisely, the randomness of the $\alpha$-Stable ReLU-NN at initialization does not vanish in the large-width training dynamics.

Cite

Text

Favaro et al. "Large-Width Asymptotics and Training Dynamics of $\alpha$-Stable ReLU Neural Networks." Transactions on Machine Learning Research, 2024.

Markdown

[Favaro et al. "Large-Width Asymptotics and Training Dynamics of $\alpha$-Stable ReLU Neural Networks." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/favaro2024tmlr-largewidth/)

BibTeX

@article{favaro2024tmlr-largewidth,
  title     = {{Large-Width Asymptotics and Training Dynamics of $\alpha$-Stable ReLU Neural Networks}},
  author    = {Favaro, Stefano and Fortini, Sandra and Peluchetti, Stefano},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/favaro2024tmlr-largewidth/}
}