SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training

Abstract

Training large language models (LLMs) is highly resource-intensive due to their massive number of parameters and the overhead of optimizer states. While recent work has aimed to reduce memory consumption, such efforts often entail trade-offs among memory efficiency, training time, and model performance. Yet, true democratization of LLMs requires simultaneous progress across all three dimensions. To this end, we propose SubTrack++ that leverages Grassmannian gradient subspace tracking combined with projection-aware optimizers, enabling Adam’s internal statistics to adapt to subspace changes. Additionally, employing recovery scaling, a technique that restores information lost through low-rank projections, further enhances model performance. Our method demonstrates SOTA convergence by exploiting Grassmannian geometry, **reducing training wall-time by up to 65%** compared to the best performing baseline, LDAdam, while preserving the reduced memory footprint. Code is at https://github.com/criticalml-uw/SubTrack.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Rajabi et al. "SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training." Advances in Neural Information Processing Systems, 2025.

Markdown

[Rajabi et al. "SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/rajabi2025neurips-subtrack/)

BibTeX

@inproceedings{rajabi2025neurips-subtrack,
  title     = {{SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training}},
  author    = {Rajabi, Sahar and Nonta, Nayeema and Rambhatla, Sirisha},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/rajabi2025neurips-subtrack/}
}