DUO: No Compromise to Accuracy Degradation
Abstract
Distributed training often suffers from high communication overhead due to large-scale gradient synchronization. Although gradient compression—particularly at 4-bit or even lower precision—significantly reduces transfer volume, it typically results in sacrifice in precision and degradation of the final model accuracy. In this work, we introduce DUO, a distributed training framework designed to mitigate accuracy degradation incurred by gradient compression without involving additional overhead. DUO achieves this by inserting an additional high-precision gradient synchronization step into a previously computation-only phase, so that its communication is fully hidden by computation. We provide a comprehensive theoretical proof of convergence for DUO and validate its effectiveness through extensive pre-training experiments on GPT models. Our results indicate that DUO effectively restores accuracy when using 4-bit gradient compression, achieving performance comparable to uncompressed training. Remarkably, DUO maintains minimal accuracy degradation even under extreme compression scenarios, including 1-bit gradients or complete omission of the low-precision gradient communication step (0-bit transmission).
Cite
Text
Jia et al. "DUO: No Compromise to Accuracy Degradation." Advances in Neural Information Processing Systems, 2025.Markdown
[Jia et al. "DUO: No Compromise to Accuracy Degradation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/jia2025neurips-duo/)BibTeX
@inproceedings{jia2025neurips-duo,
title = {{DUO: No Compromise to Accuracy Degradation}},
author = {Jia, Jinda and Xie, Cong and Lu, Hanlin and Ye, Fanjiang and Feng, Hao and Wang, Daoce and Lin, Haibin and Zhang, Zhi and Liu, Xin},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/jia2025neurips-duo/}
}