ESPACE: Dimensionality Reduction of Activations for Model Compression

Abstract

We propose ESPACE, an LLM compression technique based on dimensionality reduction of activations. Unlike prior works on weight-centric tensor decomposition, ESPACE projects activations onto a pre-calibrated set of principal components. The activation-centrality of the approach enables retraining LLMs with no loss of expressivity; while at inference, weight decomposition is obtained as a byproduct of matrix multiplication associativity. Theoretical results on the construction of projection matrices with optimal computational accuracy are provided. Experimentally, we find ESPACE enables 50% compression of GPT3, Llama2, and Nemotron4 models with small accuracy degradation, as low as a 0.18 perplexity increase on GPT3-22B. At lower compression rates of 20% to 40%, ESPACE drives GPT3 models to outperforming their baseline, by up to a 0.38 decrease in perplexity for GPT3-8B. ESPACE also reduces GEMM execution time and prefill inference latency on existing hardware. Comparison with related works on compressing Llama2-7B via matrix factorization shows that ESPACE is a first step in advancing the state-of-the-art in tensor decomposition compression of LLMs.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Sakr and Khailany. "ESPACE: Dimensionality Reduction of Activations for Model Compression." Neural Information Processing Systems, 2024. doi:10.52202/079017-0556

Markdown

[Sakr and Khailany. "ESPACE: Dimensionality Reduction of Activations for Model Compression." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/sakr2024neurips-espace/) doi:10.52202/079017-0556

BibTeX

@inproceedings{sakr2024neurips-espace,
  title     = {{ESPACE: Dimensionality Reduction of Activations for Model Compression}},
  author    = {Sakr, Charbel and Khailany, Brucek},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0556},
  url       = {https://mlanthology.org/neurips/2024/sakr2024neurips-espace/}
}