PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization

Abstract

Alternating gradient descent (A-GD) is a simple but popular algorithm in machine learning, which updates two blocks of variables in an alternating manner using gradient descent steps. In this paper, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed A-GD (PA-GD) which is able to converge (with high probability) to the second-order stationary points (SOSPs) with a global sublinear rate. Existing analysis on A-GD type algorithm either only guarantees convergence to first-order solutions, or converges to second-order solutions asymptotically (without rates). To the best of our knowledge, this is the first alternating type algorithm that takes $\mathcal{O}(\text{polylog}(d)/\epsilon^2)$ iterations to achieve an ($\epsilon,\sqrt{\epsilon}$)-SOSP with high probability, where polylog$(d)$ denotes the polynomial of the logarithm with respect to problem dimension $d$.

Cite

Text

Lu et al. "PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization." International Conference on Machine Learning, 2019.

Markdown

[Lu et al. "PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/lu2019icml-pagd/)

BibTeX

@inproceedings{lu2019icml-pagd,
  title     = {{PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization}},
  author    = {Lu, Songtao and Hong, Mingyi and Wang, Zhengdao},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {4134-4143},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/lu2019icml-pagd/}
}