High Probability Bounds for Non-Convex Stochastic Optimization with Momentum
Abstract
Stochastic gradient descent with momentum (SGDM) is widely used in machine learning, yet high-probability learning bounds for SGDM in non-convex settings remain scarce. In this paper, we provide high-probability convergence bounds and generalization bounds for SGDM. First, we establish such bounds for the gradient norm in the general non-convex case. The resulting convergence bounds are tighter than existing theoretical results, and the obtained generalization bounds seem to be the first for SGDM. Next, under the Polyak-{\L}ojasiewicz condition, we derive bounds for the function-value error instead of the gradient norm, and the corresponding learning rates are faster than in the general non-convex case. Finally, by additionally assuming a mild Bernstein condition on the gradient, we obtain even sharper generalization bounds whose learning rates can reach $\widetilde{\mathcal{O}}(1/n^2)$ in the low-noise regime, where $n$ is the sample size. Overall, we provide a systematic study of high-probability learning bounds for non-convex SGDM.
Cite
Text
Li et al. "High Probability Bounds for Non-Convex Stochastic Optimization with Momentum." International Conference on Learning Representations, 2026.Markdown
[Li et al. "High Probability Bounds for Non-Convex Stochastic Optimization with Momentum." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-high/)BibTeX
@inproceedings{li2026iclr-high,
title = {{High Probability Bounds for Non-Convex Stochastic Optimization with Momentum}},
author = {Li, Shaojie and Tang, Pengwei and Zhu, Bowei and Liu, Yong},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/li2026iclr-high/}
}