FLAR-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression

Thoma, Moritz; Villasante, Jorge; Aghajanzadeh, Emad; Sampath, Shambhavi Balamuthu; Morì, Pierpaolo; Groetzinger, Maximilian; Dylkin, Daniil; Vemparala, Manoj Rohit; Fasfous, Nael; Frickenstein, Alexander; Mueller-Gritschneder, Daniel; Schlichtmann, Ulf

FLAR-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression

Moritz Thoma, Jorge Villasante, Emad Aghajanzadeh, Shambhavi Balamuthu Sampath, Pierpaolo Morì, Maximilian Groetzinger, Daniil Dylkin, Manoj Rohit Vemparala, Nael Fasfous, Alexander Frickenstein, Daniel Mueller-Gritschneder, Ulf Schlichtmann

CVPRW 2025 pp. 1898-1907

/cvprw/2025/thoma2025cvprw-flarsvd/

Abstract

Advanced deep learning architectures have achieved exceptional prediction performance but come with significant computational demands, posing challenges for deployment on resource-constrained devices such as edge devices. While pruning techniques offer a way to reduce model complexity, they often lead to substantial accuracy loss and can require extensive retraining. Alternatively, Singular Value Decomposition (SVD) provides a promising solution by decomposing model weights into lower-dimensional representations, thus maintaining a closer representation of the original features and preserving accuracy. Despite progress in this domain, approaches targeted on vision model architectures typically rely on uniform compression or slow, computationally expensive rank search methods that do not account for latency improvements. In this paper, we introduce Fast, Latency-Aware Rank Singular Value Decomposition (FLAR-SVD), a novel approach that leverages inherent SVD properties to accelerate the rank search process and incorporates latency tuning to further optimize performance for hardware targets. We demonstrate the capability of our approach across CNN, ViT and Mamba architectures on both server and edge hardware. For DeiT we achieve 81.0% accuracy on ImageNet with only 1 epoch of fine-tuning, while reducing latency by 30% over the baseline. Code will be published upon acceptance of the paper.

PDF CVPRW Semantic Scholar

Cite

Text

Thoma et al. "FLAR-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Thoma et al. "FLAR-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/thoma2025cvprw-flarsvd/)

BibTeX

@inproceedings{thoma2025cvprw-flarsvd,
  title     = {{FLAR-SVD: Fast and Latency-Aware Singular Value Decomposition for Model Compression}},
  author    = {Thoma, Moritz and Villasante, Jorge and Aghajanzadeh, Emad and Sampath, Shambhavi Balamuthu and Morì, Pierpaolo and Groetzinger, Maximilian and Dylkin, Daniil and Vemparala, Manoj Rohit and Fasfous, Nael and Frickenstein, Alexander and Mueller-Gritschneder, Daniel and Schlichtmann, Ulf},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {1898-1907},
  url       = {https://mlanthology.org/cvprw/2025/thoma2025cvprw-flarsvd/}
}