Gaussian Approximation and Concentration of Constant Learning-Rate Stochastic Gradient Descent
Abstract
We establish a comprehensive finite-sample and asymptotic theory for stochastic gradient descent (SGD) with constant learning rates. First, we propose a novel linear approximation technique to provide a quenched central limit theorem (CLT) for SGD iterates with refined tail properties, showing that regardless of the chosen initialization, the fluctuations of the algorithm around its target point converge to a multivariate normal distribution. Our conditions are substantially milder than those required in the classical CLTs for SGD, yet offering a stronger convergence result. Furthermore, we derive the first Berry-Esseen bound -- the Gaussian approximation error -- for the constant learning-rate SGD, which is sharp compared to the decaying learning-rate schemes in the literature. Beyond the moment convergence, we also provide the Nagaev-type inequality for the SGD tail probabilities by adopting the autoregressive approximation techniques, which entails non-asymptotic large-deviation guarantees. These results are verified via numerical simulations, paving the way for theoretically grounded uncertainty quantification, especially with non-asymptotic validity.
Cite
Text
Wei et al. "Gaussian Approximation and Concentration of Constant Learning-Rate Stochastic Gradient Descent." Advances in Neural Information Processing Systems, 2025.Markdown
[Wei et al. "Gaussian Approximation and Concentration of Constant Learning-Rate Stochastic Gradient Descent." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wei2025neurips-gaussian/)BibTeX
@inproceedings{wei2025neurips-gaussian,
title = {{Gaussian Approximation and Concentration of Constant Learning-Rate Stochastic Gradient Descent}},
author = {Wei, Ziyang and Li, Jiaqi and Lou, Zhipeng and Wu, Wei Biao},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/wei2025neurips-gaussian/}
}