Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions
Abstract
We study stochastic gradient descent (SGD) with gradient clipping on convex functions under a generalized smoothness assumption called $(L_0,L_1)$-smoothness. Using gradient clipping, we establish a high probability convergence rate that matches the SGD rate in the $L$ smooth case up to polylogarithmic factors and additive terms. We also propose a variation of adaptive SGD with gradient clipping, which achieves the same guarantee. We perform empirical experiments to examine our theory and algorithmic choices.
Cite
Text
Gaash et al. "Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions." Advances in Neural Information Processing Systems, 2025.Markdown
[Gaash et al. "Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/gaash2025neurips-convergence/)BibTeX
@inproceedings{gaash2025neurips-convergence,
title = {{Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions}},
author = {Gaash, Ofir and Levy, Kfir Yehuda and Carmon, Yair},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/gaash2025neurips-convergence/}
}