Sharp Analysis of Stochastic Optimization Under Global Kurdyka-Lojasiewicz Inequality
Abstract
We study the complexity of finding the global solution to stochastic nonconvex optimization when the objective function satisfies global Kurdyka-{\L}ojasiewicz (KL) inequality and the queries from stochastic gradient oracles satisfy mild expected smoothness assumption. We first introduce a general framework to analyze Stochastic Gradient Descent (SGD) and its associated nonlinear dynamics under the setting. As a byproduct of our analysis, we obtain a sample complexity of $\mathcal{O}(\epsilon^{-(4-\alpha)/\alpha})$ for SGD when the objective satisfies the so called $\alpha$-P{\L} condition, where $\alpha$ is the degree of gradient domination. Furthermore, we show that a modified SGD with variance reduction and restarting (PAGER) achieves an improved sample complexity of $\mathcal{O}(\epsilon^{-2/\alpha})$ when the objective satisfies the average smoothness assumption. This leads to the first optimal algorithm for the important case of $\alpha=1$ which appears in applications such as policy optimization in reinforcement learning.
Cite
Text
Fatkhullin et al. "Sharp Analysis of Stochastic Optimization Under Global Kurdyka-Lojasiewicz Inequality." Neural Information Processing Systems, 2022.Markdown
[Fatkhullin et al. "Sharp Analysis of Stochastic Optimization Under Global Kurdyka-Lojasiewicz Inequality." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/fatkhullin2022neurips-sharp/)BibTeX
@inproceedings{fatkhullin2022neurips-sharp,
title = {{Sharp Analysis of Stochastic Optimization Under Global Kurdyka-Lojasiewicz Inequality}},
author = {Fatkhullin, Ilyas and Etesami, Jalal and He, Niao and Kiyavash, Negar},
booktitle = {Neural Information Processing Systems},
year = {2022},
url = {https://mlanthology.org/neurips/2022/fatkhullin2022neurips-sharp/}
}