Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing Its Preconditioner

Abstract

The recent success of Shampoo in the AlgoPerf contest has sparked renewed interest in Kronecker-factorization-based optimization algorithms for training neural networks. Despite its success, Shampoo relies heavily on several heuristics such as learning rate grafting and stale preconditioning to achieve performance at-scale. These heuristics increase algorithmic complexity, necessitate further hyperparameter tuning, and lack theoretical justification. This paper investigates these heuristics from the angle of Frobenius norm approximation to full-matrix Adam and decouples the preconditioner's eigenvalues and eigenbasis updates. We show that grafting from Adam mitigates the staleness and mis-scaling of the preconditioner's *eigenvalues* and how correcting the eigenvalues directly eliminates the need for learning rate grafting. To manage the error induced by infrequent *eigenbasis* computations, we propose an adaptive criterion for determining the eigenbasis computation frequency motivated by terminating a warm-started QR algorithm. This criterion decouples the update frequency of different preconditioner matrices and enables us to investigate the impact of approximation error on convergence. These practical techniques offer a principled angle towards removing Shampoo's heuristics and developing improved Kronecker-factorization-based training algorithms.

Cite

Text

Eschenhagen et al. "Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing Its Preconditioner." Advances in Neural Information Processing Systems, 2025.

Markdown

[Eschenhagen et al. "Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing Its Preconditioner." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/eschenhagen2025neurips-purifying/)

BibTeX

@inproceedings{eschenhagen2025neurips-purifying,
  title     = {{Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing Its Preconditioner}},
  author    = {Eschenhagen, Runa and Defazio, Aaron and Lee, Tsung-Hsien and Turner, Richard E. and Shi, Hao-Jun Michael},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/eschenhagen2025neurips-purifying/}
}