High-Probability Bounds for the Last Iterate of Clipped SGD

Abstract

We study the problem of minimizing a convex objective when only noisy gradient estimates are available. Assuming that stochastic gradients have finite $\alpha$-th moments for some $\alpha \in (1,2]$, we establish - for the first time - a high-probability convergence guarantee for the last iterate of clipped stochastic gradient descent (Clipped-SGD) on smooth objectives. In particular, we prove a rate of $1/K^{(2\alpha-2)/(3\alpha)}$ with only polylogarithmic dependence on the confidence parameter. In addition, we introduce a new technique for deriving in-expectation convergence guarantees from high-probability bounds for methods with almost surely bounded updates, and apply it to obtain expectation guarantees for Clipped-SGD. Finally, we complement our theoretical analysis with empirical results that support and illustrate our findings.

Cite

Text

Chezhegov et al. "High-Probability Bounds for the Last Iterate of Clipped SGD." International Conference on Learning Representations, 2026.

Markdown

[Chezhegov et al. "High-Probability Bounds for the Last Iterate of Clipped SGD." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chezhegov2026iclr-highprobability/)

BibTeX

@inproceedings{chezhegov2026iclr-highprobability,
  title     = {{High-Probability Bounds for the Last Iterate of Clipped SGD}},
  author    = {Chezhegov, Savelii and Parletta, Daniela Angela and Paudice, Andrea and Gorbunov, Eduard},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/chezhegov2026iclr-highprobability/}
}