ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal Rate

Abstract

Adam is one of the most popular optimization algorithms in deep learning. However, it is known that Adam does not converge in theory unless choosing a hyperparameter, i.e., $\beta_2$, in a problem-dependent manner. There have been many attempts to fix the non-convergence (e.g., AMSGrad), but they require an impractical assumption that the gradient noise is uniformly bounded. In this paper, we propose a new adaptive gradient method named ADOPT, which achieves the optimal convergence rate of $\mathcal{O} ( 1 / \sqrt{T} )$ with any choice of $\beta_2$ without depending on the bounded noise assumption. ADOPT addresses the non-convergence issue of Adam by removing the current gradient from the second moment estimate and changing the order of the momentum update and the normalization by the second moment estimate. We also conduct intensive numerical experiments, and verify that our ADOPT achieves superior results compared to Adam and its variants across a wide range of tasks, including image classification, generative modeling, natural language processing, and deep reinforcement learning. The implementation is available at https://github.com/iShohei220/adopt.

Cite

Text

Taniguchi et al. "ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal Rate." Neural Information Processing Systems, 2024. doi:10.52202/079017-2309

Markdown

[Taniguchi et al. "ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal Rate." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/taniguchi2024neurips-adopt/) doi:10.52202/079017-2309

BibTeX

@inproceedings{taniguchi2024neurips-adopt,
  title     = {{ADOPT: Modified Adam Can Converge with Any $\beta_2$ with the Optimal Rate}},
  author    = {Taniguchi, Shohei and Harada, Keno and Minegishi, Gouki and Oshima, Yuta and Jeong, Seong Cheol and Nagahara, Go and Iiyama, Tomoshi and Suzuki, Masahiro and Iwasawa, Yusuke and Matsuo, Yutaka},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2309},
  url       = {https://mlanthology.org/neurips/2024/taniguchi2024neurips-adopt/}
}