Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning
Abstract
We explore the use of matrix product operators (MPOs) to compress transformer-based architectures. By factorizing full-rank weight matrices into tensor-train product, MPOs reduce both memory footprint and computational cost, which is critical for deployment on resource‑constrained devices. Our experiments on speaker identification using the LibriSpeech train-clean-360 subset show that MPO-based models, and even their pruned variants, maintain high performance with far fewer parameters than full‑rank transformers. We detail the mathematical principles underlying low‑rank factorization and unstructured pruning and discuss next steps for extending this approach to more complex tasks such as automatic speech recognition (ASR).
Cite
Text
Mikhak et al. "Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning." ICLR 2025 Workshops: SLLM, 2025.Markdown
[Mikhak et al. "Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/mikhak2025iclrw-efficient/)BibTeX
@inproceedings{mikhak2025iclrw-efficient,
title = {{Efficient Transformers via MPO-Based Low-Rank Factorization and Pruning}},
author = {Mikhak, Sam and Gummidi, Venkata Sai and Medepalli, Praneeth and Zhu, Kevin},
booktitle = {ICLR 2025 Workshops: SLLM},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/mikhak2025iclrw-efficient/}
}