On Convergence of Adam for Stochastic Optimization Under Relaxed Assumptions
Abstract
In this paper, we study Adam in non-convex smooth scenarios with potential unbounded gradients and affine variance noise. We consider a general noise model which governs affine variance noise, bounded noise, and sub-Gaussian noise. We show that Adam with a specific hyper-parameter setup can find a stationary point with a $\mathcal{O}(\text{poly}(\log T)/\sqrt{T})$ rate in high probability under this general noise model where $T$ denotes total number iterations, matching the lower rate of stochastic first-order algorithms up to logarithm factors. We also provide a probabilistic convergence result for Adam under a generalized smooth condition which allows unbounded smoothness parameters and has been illustrated empirically to capture the smooth property of many practical objective functions more accurately.
Cite
Text
Hong and Lin. "On Convergence of Adam for Stochastic Optimization Under Relaxed Assumptions." Neural Information Processing Systems, 2024. doi:10.52202/079017-0346Markdown
[Hong and Lin. "On Convergence of Adam for Stochastic Optimization Under Relaxed Assumptions." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/hong2024neurips-convergence/) doi:10.52202/079017-0346BibTeX
@inproceedings{hong2024neurips-convergence,
title = {{On Convergence of Adam for Stochastic Optimization Under Relaxed Assumptions}},
author = {Hong, Yusu and Lin, Junhong},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-0346},
url = {https://mlanthology.org/neurips/2024/hong2024neurips-convergence/}
}