Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation

Sheshukova, Marina; Belomestny, Denis; Durmus, Alain Oliviero; Moulines, Eric; Naumov, Alexey; Samsonov, Sergey

Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation

Marina Sheshukova, Denis Belomestny, Alain Oliviero Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov

ICLR 2025

/iclr/2025/sheshukova2025iclr-nonasymptotic/

Abstract

We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$. We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order $\mathcal{O}(n^{-1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $\mathcal{O}(n^{-3/4})$, where the power $3/4$ is best known. We also extend this result to the higher-order moment bounds. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.

PDF ICLR Semantic Scholar

Cite

Text

Sheshukova et al. "Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation." International Conference on Learning Representations, 2025.

Markdown

[Sheshukova et al. "Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/sheshukova2025iclr-nonasymptotic/)

BibTeX

@inproceedings{sheshukova2025iclr-nonasymptotic,
  title     = {{Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson–Romberg Extrapolation}},
  author    = {Sheshukova, Marina and Belomestny, Denis and Durmus, Alain Oliviero and Moulines, Eric and Naumov, Alexey and Samsonov, Sergey},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/sheshukova2025iclr-nonasymptotic/}
}