NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models

Abstract

Large language models (LLMs) exhibit remarkable performance across various natural language processing tasks but suffer from immense computational and memory demands, limiting their deployment in resource-constrained environ ments. To address this challenge, we propose NoWA (Normalized Weight and Activation Compression), a unified framework for zero-shot shape preserving compression algorithms. We compressed Llama-2 7B/13B/70B and Llama-3 8B models, using two popular forms of shape-preserving compression, vector quantization NoWA-VQ (NoWA for Vector Quantization), and unstructured/structured pruning NoWA-P (NoWA for Pruning). We found that NoWA-VQ significantly outperforms state-of-the-art zero shot VQ, and that NoWA-P performs competitively against state-of-the-art methods.

Cite

Text

Liu et al. "NoWag:  A Unified Framework for Shape Preserving Compression of Large Language Models." ICLR 2025 Workshops: SLLM, 2025.

Markdown

[Liu et al. "NoWag:  A Unified Framework for Shape Preserving Compression of Large Language Models." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/liu2025iclrw-nowag/)

BibTeX

@inproceedings{liu2025iclrw-nowag,
  title     = {{NoWag:  A Unified Framework for Shape Preserving Compression of Large Language Models}},
  author    = {Liu, Lawrence Ray and Chakrabarti, Inesh and Li, Yixiao and Wang, Mengdi and Zhao, Tuo and Yang, Lin},
  booktitle = {ICLR 2025 Workshops: SLLM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/liu2025iclrw-nowag/}
}