First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems

Abstract

It is well known that both gradient descent and stochastic coordinate descent achieve a global convergence rate of $O(1/k)$ in the objective value, when applied to a scheme for minimizing a Lipschitz-continuously differentiable, unconstrained convex function. In this work, we improve this rate to $o(1/k)$. We extend the result to proximal gradient and proximal coordinate descent on regularized problems to show similar $o(1/k)$ convergence rates. The result is tight in the sense that a rate of $O(1/k^{1+\epsilon})$ is not generally attainable for any $\epsilon>0$, for any of these methods.

Cite

Text

Lee and Wright. "First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems." International Conference on Machine Learning, 2019.

Markdown

[Lee and Wright. "First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/lee2019icml-firstorder/)

BibTeX

@inproceedings{lee2019icml-firstorder,
  title     = {{First-Order Algorithms Converge Faster than $O(1/k)$ on Convex Problems}},
  author    = {Lee, Ching-Pei and Wright, Stephen},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {3754-3762},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/lee2019icml-firstorder/}
}