Adam-Family Methods for Nonsmooth Optimization with Convergence Guarantees

Abstract

In this paper, we present a comprehensive study on the convergence properties of Adam-family methods for nonsmooth optimization, especially in the training of nonsmooth neural networks. We introduce a novel two-timescale framework that adopts a two-timescale updating scheme, and prove its convergence properties under mild assumptions. Our proposed framework encompasses various popular Adam-family methods, providing convergence guarantees for these methods in training nonsmooth neural networks. Furthermore, we develop stochastic subgradient methods that incorporate gradient clipping techniques for training nonsmooth neural networks with heavy-tailed noise. Through our framework, we show that our proposed methods converge even when the evaluation noises are only assumed to be integrable. Extensive numerical experiments demonstrate the high efficiency and robustness of our proposed methods.

Cite

Text

Xiao et al. "Adam-Family Methods for Nonsmooth Optimization with Convergence Guarantees." Journal of Machine Learning Research, 2024.

Markdown

[Xiao et al. "Adam-Family Methods for Nonsmooth Optimization with Convergence Guarantees." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/xiao2024jmlr-adamfamily/)

BibTeX

@article{xiao2024jmlr-adamfamily,
  title     = {{Adam-Family Methods for Nonsmooth Optimization with Convergence Guarantees}},
  author    = {Xiao, Nachuan and Hu, Xiaoyin and Liu, Xin and Toh, Kim-Chuan},
  journal   = {Journal of Machine Learning Research},
  year      = {2024},
  pages     = {1-53},
  volume    = {25},
  url       = {https://mlanthology.org/jmlr/2024/xiao2024jmlr-adamfamily/}
}