Layer-Wise Quantization for Quantized Optimistic Dual Averaging
Abstract
Modern deep neural networks exhibit heterogeneity across numerous layers of various types such as residuals, multi-head attention, etc., due to varying structures (dimensions, activation functions, etc.), distinct representation characteristics, which impact predictions. We develop a general layer-wise quantization framework with tight variance and code-length bounds, adapting to the heterogeneities over the course of training. We then apply a new layer-wise quantization technique within distributed variational inequalities (VIs), proposing a novel Quantized Optimistic Dual Averaging (QODA) algorithm with adaptive learning rates, which achieves competitive convergence rates for monotone VIs. We empirically show that QODA achieves up to a $150$% speedup over the baselines in end-to-end training time for training Wasserstein GAN on $12+$ GPUs.
Cite
Text
Nguyen et al. "Layer-Wise Quantization for Quantized Optimistic Dual Averaging." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Nguyen et al. "Layer-Wise Quantization for Quantized Optimistic Dual Averaging." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/nguyen2025icml-layerwise/)BibTeX
@inproceedings{nguyen2025icml-layerwise,
title = {{Layer-Wise Quantization for Quantized Optimistic Dual Averaging}},
author = {Nguyen, Anh Duc and Markov, Ilia and Wu, Zhengqing and Ramezani-Kebrya, Ali and Antonakopoulos, Kimon and Alistarh, Dan and Cevher, Volkan},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {46026-46072},
volume = {267},
url = {https://mlanthology.org/icml/2025/nguyen2025icml-layerwise/}
}