Convergence Guarantees for RMSProp and Adam in Generalized-Smooth Non-Convex Optimization with Affine Noise Variance
Abstract
This paper provides the first tight convergence analyses for RMSProp and Adam for non-convex optimization under the most relaxed assumptions of coordinate-wise generalized smoothness and affine noise variance. RMSProp is firstly analyzed, which is a special case of Adam with adaptive learning rates but without first-order momentum. Specifically, to solve the challenges due to the dependence among adaptive update, unbounded gradient estimate and Lipschitz constant, we demonstrate that the first-order term in the descent lemma converges and its denominator is upper bounded by a function of gradient norm. Based on this result, we show that RMSProp with proper hyperparameters converges to an $\epsilon$-stationary point with an iteration complexity of $\mathcal O(\epsilon^{-4})$. We then generalize our analysis to Adam, where the additional challenge is due to a mismatch between the gradient and the first-order momentum. We develop a new upper bound on the first-order term in the descent lemma, which is also a function of the gradient norm. We show that Adam with proper hyperparameters converges to an $\epsilon$-stationary point with an iteration complexity of $\mathcal O(\epsilon^{-4})$. Our complexity results for both RMSProp and Adam match with the complexity lower bound established in Arjevani et al. (2023).
Cite
Text
Zhang et al. "Convergence Guarantees for RMSProp and Adam in Generalized-Smooth Non-Convex Optimization with Affine Noise Variance." Transactions on Machine Learning Research, 2025.Markdown
[Zhang et al. "Convergence Guarantees for RMSProp and Adam in Generalized-Smooth Non-Convex Optimization with Affine Noise Variance." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/zhang2025tmlr-convergence/)BibTeX
@article{zhang2025tmlr-convergence,
title = {{Convergence Guarantees for RMSProp and Adam in Generalized-Smooth Non-Convex Optimization with Affine Noise Variance}},
author = {Zhang, Qi and Zhou, Yi and Zou, Shaofeng},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/zhang2025tmlr-convergence/}
}