Convergence of Adam for Non-Convex Objectives: Relaxed Hyperparameters and Non-Ergodic Case

Abstract

Adam is a commonly used stochastic optimization algorithm in machine learning. However, its convergence is still not fully understood, especially in the non-convex setting. This paper focuses on exploring hyperparameter settings for the convergence of vanilla Adam and tackling the challenges of non-ergodic convergence related to practical applications. The primary contributions are summarized as follows: firstly, we introduce precise definitions of ergodic and non-ergodic convergence, which cover nearly all forms of convergence for stochastic optimization algorithms. Meanwhile, we emphasize the superiority of non-ergodic convergence over ergodic convergence. Secondly, we establish a weaker sufficient condition for the ergodic convergence guarantee of Adam, allowing a more relaxed choice of hyperparameters. On this basis, we achieve the almost sure ergodic convergence rate of Adam, which is arbitrarily close to o(1/K)\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$o(1/\sqrt{K})$\end{document}. More importantly, we prove, for the first time, that the last iterate of Adam converges to a stationary point for non-convex objectives. Finally, we obtain the non-ergodic convergence rate of O(1/K) for function values under the Polyak-Łojasiewicz (PL) condition. These findings build a solid theoretical foundation for Adam to solve non-convex stochastic optimization problems. Numerical experiments validate the effectiveness of Adam and support our theoretical findings.

Cite

Text

Liang et al. "Convergence of Adam for Non-Convex Objectives: Relaxed Hyperparameters and Non-Ergodic Case." Machine Learning, 2025. doi:10.1007/S10994-025-06737-W

Markdown

[Liang et al. "Convergence of Adam for Non-Convex Objectives: Relaxed Hyperparameters and Non-Ergodic Case." Machine Learning, 2025.](https://mlanthology.org/mlj/2025/liang2025mlj-convergence/) doi:10.1007/S10994-025-06737-W

BibTeX

@article{liang2025mlj-convergence,
  title     = {{Convergence of Adam for Non-Convex Objectives: Relaxed Hyperparameters and Non-Ergodic Case}},
  author    = {Liang, Yuqing and He, Meixuan and Liu, Jinlan and Xu, Dongpo},
  journal   = {Machine Learning},
  year      = {2025},
  pages     = {75},
  doi       = {10.1007/S10994-025-06737-W},
  volume    = {114},
  url       = {https://mlanthology.org/mlj/2025/liang2025mlj-convergence/}
}