Provable Accelerated Gradient Method for Nonconvex Low Rank Optimization
Abstract
Optimization over low rank matrices has broad applications in machine learning. For large-scale problems, an attractive heuristic is to factorize the low rank matrix to a product of two much smaller matrices. In this paper, we study the nonconvex problem minU∈Rn×rg(U)=f(UUT)\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\min _{\mathbf {U}\in \mathbb {R}^{n\times r}} g(\mathbf {U})=f(\mathbf {U}\mathbf {U}^T)$\end{document} under the assumptions that f(X)\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$f(\mathbf {X})$\end{document} is restricted μ\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\mu $\end{document}-strongly convex and L-smooth on the set X:X⪰0,rank(X)≤r\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\{\mathbf {X}:\mathbf {X}\succeq 0,\text{ rank }(\mathbf {X})\le r\}$\end{document}. We propose an accelerated gradient method with alternating constraint that operates directly on the U\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\mathbf {U}$\end{document} factors and show that the method has local linear convergence rate with the optimal dependence on the condition number of L/μ\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\sqrt{L/\mu }$\end{document}. Globally, our method converges to the critical point with zero gradient from any initializer. Our method also applies to the problem with the asymmetric factorization of X=U~V~T\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$\mathbf {X}={\widetilde{\mathbf {U}}}{\widetilde{\mathbf {V}}}^T$\end{document} and the same convergence result can be obtained. Extensive experimental results verify the advantage of our method.
Cite
Text
Li and Lin. "Provable Accelerated Gradient Method for Nonconvex Low Rank Optimization." Machine Learning, 2020. doi:10.1007/S10994-019-05819-WMarkdown
[Li and Lin. "Provable Accelerated Gradient Method for Nonconvex Low Rank Optimization." Machine Learning, 2020.](https://mlanthology.org/mlj/2020/li2020mlj-provable/) doi:10.1007/S10994-019-05819-WBibTeX
@article{li2020mlj-provable,
title = {{Provable Accelerated Gradient Method for Nonconvex Low Rank Optimization}},
author = {Li, Huan and Lin, Zhouchen},
journal = {Machine Learning},
year = {2020},
pages = {103-134},
doi = {10.1007/S10994-019-05819-W},
volume = {109},
url = {https://mlanthology.org/mlj/2020/li2020mlj-provable/}
}