On the Crucial Role of Initialization for Matrix Factorization

Abstract

This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization. We introduce Nystrom initialization, which significantly improves the global convergence of Scaled Gradient Descent (ScaledGD) in both symmetric and asymmetric matrix factorization tasks. Specifically, we prove that ScaledGD with Nystrom initialization achieves quadratic convergence in cases where only linear rates were previously known. Finally, we equip low-rank adapters (LoRA) with Nystrom initialization for practical merits. The effectiveness of the resultant approach, NoRA, is demonstrated on several representative tasks for finetuning large language models (LLMs).

Cite

Text

Li et al. "On the Crucial Role of Initialization for Matrix Factorization." NeurIPS 2024 Workshops: OPT, 2024.

Markdown

[Li et al. "On the Crucial Role of Initialization for Matrix Factorization." NeurIPS 2024 Workshops: OPT, 2024.](https://mlanthology.org/neuripsw/2024/li2024neuripsw-crucial/)

BibTeX

@inproceedings{li2024neuripsw-crucial,
  title     = {{On the Crucial Role of Initialization for Matrix Factorization}},
  author    = {Li, Bingcong and Zhang, Liang and Mokhtari, Aryan and He, Niao},
  booktitle = {NeurIPS 2024 Workshops: OPT},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/li2024neuripsw-crucial/}
}