Fundamental Benefit of Alternating Updates in Minimax Optimization

Abstract

The Gradient Descent-Ascent (GDA) algorithm, designed to solve minimax optimization problems, takes the descent and ascent steps either simultaneously (Sim-GDA) or alternately (Alt-GDA). While Alt-GDA is commonly observed to converge faster, the performance gap between the two is not yet well understood theoretically, especially in terms of global convergence rates. To address this theory-practice gap, we present fine-grained convergence analyses of both algorithms for strongly-convex-strongly-concave and Lipschitz-gradient objectives. Our new iteration complexity upper bound of Alt-GDA is strictly smaller than the lower bound of Sim-GDA; i.e., Alt-GDA is provably faster. Moreover, we propose Alternating-Extrapolation GDA (Alex-GDA), a general algorithmic framework that subsumes Sim-GDA and Alt-GDA, for which the main idea is to alternately take gradients from extrapolations of the iterates. We show that Alex-GDA satisfies a smaller iteration complexity bound, identical to that of the Extra-gradient method, while requiring less gradient computations. We also prove that Alex-GDA enjoys linear convergence for bilinear problems, for which both Sim-GDA and Alt-GDA fail to converge at all.

Cite

Text

Lee et al. "Fundamental Benefit of Alternating Updates in Minimax Optimization." International Conference on Machine Learning, 2024.

Markdown

[Lee et al. "Fundamental Benefit of Alternating Updates in Minimax Optimization." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/lee2024icml-fundamental/)

BibTeX

@inproceedings{lee2024icml-fundamental,
  title     = {{Fundamental Benefit of Alternating Updates in Minimax Optimization}},
  author    = {Lee, Jaewook and Cho, Hanseul and Yun, Chulhee},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {26439-26514},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/lee2024icml-fundamental/}
}