Ternary Momentum for Quantized Training

Abstract

Quantization enables efficient inference on resource-limited devices, yet training still depends on high-precision gradients and optimizer states. We address this gap by introducing stochastic ternary momentum, a fully quantized optimizer that operates with quantized parameters, ternary gradient information, and enables ternary momentum states for stable and memory efficient quantized optimization. Our method replaces deterministic and full-precision updates with integer-valued updates driven by stochastic sampling, ensuring that expected updates match standard momentum while maintaining strict memory constraints. It eliminates re-quantization overhead and preserves quantization consistency throughout training. We establish theoretical convergence guarantees of our ternary momentum method for convex objectives over bounded integer domains and for non-convex objectives over unbounded integer domains. Experiments on vision and language tasks demonstrate that our approach retains strong performance while reducing optimizer memory by 95\% compared to full-precision, advancing the feasibility of fully quantized training.

Cite

Text

Bar et al. "Ternary Momentum for Quantized Training." Transactions on Machine Learning Research, 2026.

Markdown

[Bar et al. "Ternary Momentum for Quantized Training." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/bar2026tmlr-ternary/)

BibTeX

@article{bar2026tmlr-ternary,
  title     = {{Ternary Momentum for Quantized Training}},
  author    = {Bar, Noga and Attia, Amit and Moshkovitz, Michal and Di Castro, Dotan},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/bar2026tmlr-ternary/}
}