On the Convergence Direction of Gradient Descent
Abstract
Gradient descent (GD) is a fundamental optimization method in deep learning, yet its asymptotic directional properties remain less understood. In this paper, we prove that if GD converges, its trajectory either aligns toward a fixed direction or oscillates along a specific line. The fixed-direction convergence occurs under small learning rates, while the oscillatory convergence behavior emerges for large learning rates. This result offers a new lens for understanding long-term GD dynamics. Experimentally, we find that this directional convergence behavior also appears in stochastic gradient descent (SGD) and Adam. Furthermore, we discuss how these theoretical findings regarding oscillatory convergence might offer a perspective on the sharpness dynamics observed in the Edge of Stability (EoS) regime. Our work provides both theoretical clarity and practical insight into the behavior of dynamics for multiple optimization methods.
Cite
Text
Chen et al. "On the Convergence Direction of Gradient Descent." International Conference on Learning Representations, 2026.Markdown
[Chen et al. "On the Convergence Direction of Gradient Descent." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-convergence/)BibTeX
@inproceedings{chen2026iclr-convergence,
title = {{On the Convergence Direction of Gradient Descent}},
author = {Chen, Shuo and Li, Xiaolong and Peng, Jiaying and Zhao, Yao},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/chen2026iclr-convergence/}
}