Gradient Methods with Online Scaling

Abstract

We introduce a framework to accelerate the convergence of gradient-based methods with online learning. The framework learns to scale the gradient at each iteration through an online learning algorithm and provably accelerates gradient-based methods asymptotically. In contrast with previous literature, where convergence is established based on worst-case analysis, our framework provides a strong convergence guarantee with respect to the optimal stepsize for the iteration trajectory. For smooth strongly convex optimization, our framework provides an $\Ocal(\kappa^\star \log(1/\varepsilon)$) asymptotic complexity result, where $\kappa^\star$ is the condition number achievable by the optimal preconditioner, improving on the previous $\Ocal(\sqrt{n}\kappa^\star \log(1/\varepsilon))$ result. For smooth convex optimization, we obtain the first convergence guarantee for the widely used hypergradient descent heuristic.

Cite

Text

Gao et al. "Gradient Methods with Online Scaling." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.

Markdown

[Gao et al. "Gradient Methods with Online Scaling." Proceedings of Thirty Eighth Conference on Learning Theory, 2025.](https://mlanthology.org/colt/2025/gao2025colt-gradient/)

BibTeX

@inproceedings{gao2025colt-gradient,
  title     = {{Gradient Methods with Online Scaling}},
  author    = {Gao, Wenzhi and Chu, Ya-Chi and Ye, Yinyu and Udell, Madeleine},
  booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory},
  year      = {2025},
  pages     = {2192-2226},
  volume    = {291},
  url       = {https://mlanthology.org/colt/2025/gao2025colt-gradient/}
}