Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks

Abstract

The increasing GPU memory demands of large language models call for more memory-efficient training methods. Existing approaches like LoRA struggle with low-rank constraints in pre-training, while ReLoRA suffers from saddle point issues. We propose **Sparse Spectral Training (SST)**, a memory-efficient **pre-training** framework that *updates all singular values*, *selectively updates singular vectors* via multinomial sampling, and *leverages singular value decomposition (SVD) for initialization and periodic reinitialization*, reducing distortion compared to other low-rank methods. Across tasks including language generation, machine translation, and graph learning, SST outperforms existing memory-efficient training methods and is often comparable to full-rank training. On LLaMA-1.3B, SST reduces the perplexity gap to full-rank training by **97.4\%**, demonstrating its effectiveness for scalable, memory-efficient model pre-training. Our code is available at https://anonymous.4open.science/r/sparse_spectral_training-6A2C/.

Cite

Text

Zhao et al. "Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks." ICLR 2025 Workshops: SLLM, 2025.

Markdown

[Zhao et al. "Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/zhao2025iclrw-sparse/)

BibTeX

@inproceedings{zhao2025iclrw-sparse,
  title     = {{Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks}},
  author    = {Zhao, Jialin and Zhang, Yingtao and Li, Xinghang and Liu, Huaping and Cannistraci, Carlo Vittorio},
  booktitle = {ICLR 2025 Workshops: SLLM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/zhao2025iclrw-sparse/}
}