A Novel Analysis of Gradient Descent Under Directional Smoothness

Abstract

We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving an implicit equation to obtain an adapted step-size; we show that this equation is straightforward to solve for convex quadratics and leads to new guarantees for a classical step-size sequence. For general functions, we prove that exponential search can be used to obtain a path-dependent convergence guarantee with only a log-log dependency on the global smoothness constant. Experiments on quadratic functions showcase the utility of our theory and connections to the edge-of-stability phenomenon.

Cite

Text

Mishkin et al. "A Novel Analysis of Gradient Descent Under Directional Smoothness." NeurIPS 2023 Workshops: OPT, 2023.

Markdown

[Mishkin et al. "A Novel Analysis of Gradient Descent Under Directional Smoothness." NeurIPS 2023 Workshops: OPT, 2023.](https://mlanthology.org/neuripsw/2023/mishkin2023neuripsw-novel/)

BibTeX

@inproceedings{mishkin2023neuripsw-novel,
  title     = {{A Novel Analysis of Gradient Descent Under Directional Smoothness}},
  author    = {Mishkin, Aaron and Khaled, Ahmed and Defazio, Aaron and Gower, Robert M.},
  booktitle = {NeurIPS 2023 Workshops: OPT},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/mishkin2023neuripsw-novel/}
}