ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Zhao, Weibo; Shi, Yubin; Lyu, Xinyu; Sui, Wanchen; Li, Shen; Li, Yong

doi:10.1609/AAAI.V39I21.34443

ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Weibo Zhao, Yubin Shi, Xinyu Lyu, Wanchen Sui, Shen Li, Yong Li

AAAI 2025 pp. 22822-22830

doi:10.1609/AAAI.V39I21.34443 /aaai/2025/zhao2025aaai-aser/

Abstract

Quantization stands as a pivotal technique for large language model (LLM) serving, yet it poses significant challenges particularly in achieving effective low-bit quantization. The limited numerical mapping makes the quantized model produce a non-trivial error, bringing out intolerable performance degration. This paper is anchored in the basic idea of model compression objectives, and delves into the layer-wise error distribution of LLMs during post-training quantization. Subsequently, we introduce ASER, an algorithm consisting of (1) Error Reconstruction: low-rank compensation for quantization error with LoRA-style matrices constructed by whitening SVD; (2) Activation Smoothing: outlier extraction to gain smooth activation and better error compensation. ASER is capable of quantizing typical LLMs to low-bit ones, particularly preserving accuracy even in W4A8 per-channel setup. Experimental results show that ASER is competitive among the state-of-the-art quantization algorithms, showing potential to activation quantization, with minor overhead.

PDF AAAI Semantic Scholar

Cite

Text

Zhao et al. "ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34443

Markdown

[Zhao et al. "ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhao2025aaai-aser/) doi:10.1609/AAAI.V39I21.34443

BibTeX

@inproceedings{zhao2025aaai-aser,
  title     = {{ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization}},
  author    = {Zhao, Weibo and Shi, Yubin and Lyu, Xinyu and Sui, Wanchen and Li, Shen and Li, Yong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {22822-22830},
  doi       = {10.1609/AAAI.V39I21.34443},
  url       = {https://mlanthology.org/aaai/2025/zhao2025aaai-aser/}
}