MPPQ: Enhancing Post-Training Quantization for LLMs via Mixed Supervision, Proxy Rounding, and Pre-Searching

Abstract

Recently, post-training quantization (PTQ) methods for large language models (LLMs) primarily focus on tackling the challenges caused by outliers. Scaling transformation has proven to be effective while how to enhance the performance of extremely low-bitwidth (e.g., 2-bit) PTQ under it remains largely unexplored. In this work, a new PTQ framework, namely MPPQ, is established. Specifically, MPPQ first proposes an enhanced reconstruction loss based on Mixed metric supervision to mitigate the distribution inconsistency caused by quantization while providing strong regularization for learnable parameters. Secondly, we introduce a Proxy-based adaptive rounding scheme in weight quantization, which replaces the round-to-nearest (RTN) function to minimize the overall quantization errors through element-wise scaling. Furthermore, a factor coarse Pre-searching mechanism is presented to ensure proper coordination between quantization and clipping patterns, while achieving optimal initialization of clipping factors before training. Extensive experiments show that MPPQ consistently outperforms state-of-the-art methods in low-bit quantization settings. For instance, the perplexity of WikiText2 can be dramatically reduced to 8.85 (3.9 ↓ vs 12.75 of the latest method, LRQuant) for the LLaMA-2-7B model, which is quantized with W4A4.

Cite

Text

Wei et al. "MPPQ: Enhancing Post-Training Quantization for LLMs via Mixed Supervision, Proxy Rounding, and Pre-Searching." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/920

Markdown

[Wei et al. "MPPQ: Enhancing Post-Training Quantization for LLMs via Mixed Supervision, Proxy Rounding, and Pre-Searching." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/wei2025ijcai-mppq/) doi:10.24963/IJCAI.2025/920

BibTeX

@inproceedings{wei2025ijcai-mppq,
  title     = {{MPPQ: Enhancing Post-Training Quantization for LLMs via Mixed Supervision, Proxy Rounding, and Pre-Searching}},
  author    = {Wei, Mingrun and Yan, Yeyu and Wang, Dong},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8277-8285},
  doi       = {10.24963/IJCAI.2025/920},
  url       = {https://mlanthology.org/ijcai/2025/wei2025ijcai-mppq/}
}