DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization
Abstract
Quantization plays a crucial role in accelerating the inference of large-scale models, and rotational matrices have been shown to effectively improve quantization performance by smoothing outliers. However, end-to-end fine-tuning of rotational optimization algorithms incurs high computational costs and is prone to overfitting. To address this challenge, we propose an efficient distribution-aware rotational calibration method, DartQuant, which reduces the complexity of rotational optimization by constraining the distribution of the activations after rotation. This approach also effectively reduces reliance on task-specific losses, thereby mitigating the risk of overfitting. Additionally, we introduce the QR-Orth optimization scheme, which replaces expensive alternating optimization with a more efficient solution. In a variety of model quantization experiments, DartQuant demonstrates superior performance. Compared to existing methods, it achieves 47$\times$ acceleration and 10$\times$ memory savings for rotational optimization on a 70B model. Furthermore, it is the first to successfully complete rotational calibration for a 70B model on a single 3090 GPU, making quantization of large language models feasible in resource-constrained environments.
Cite
Text
Shao et al. "DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization." Advances in Neural Information Processing Systems, 2025.Markdown
[Shao et al. "DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shao2025neurips-dartquant/)BibTeX
@inproceedings{shao2025neurips-dartquant,
title = {{DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization}},
author = {Shao, Yuantian and Chen, Yuanteng and Wang, Peisong and Yu, Jianlin and Lin, Jing and Yao, Yiwu and Wei, Zhihui and Cheng, Jian},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/shao2025neurips-dartquant/}
}