Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences

Sadiev, Abdurakhmon; Malinovsky, Grigory; Gorbunov, Eduard; Sokolov, Igor; Khaled, Ahmed; Burlachenko, Konstantin; Richtárik, Peter

doi:10.52202/079017-2685

Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences

Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik

NeurIPS 2024

doi:10.52202/079017-2685 /neurips/2024/sadiev2024neurips-don/

Abstract

Gradient compression is a popular technique for improving communication complexity of stochastic first-order methods in distributed training of machine learning models. However, the existing works consider only with-replacement sampling of stochastic gradients. In contrast, it is well-known in practice and recently confirmed in theory that stochastic methods based on without-replacement sampling, e.g., Random Reshuffling (RR) method, perform better than ones that sample the gradients with-replacement. In this work, we close this gap in the literature and provide the first analysis of methods with gradient compression and without-replacement sampling. We first develop a distributed variant of random reshuffling with gradient compression (Q-RR), and show how to reduce the variance coming from gradient quantization through the use of control iterates. Next, to have a better fit to Federated Learning applications, we incorporate local computation and propose a variant of Q-RR called Q-NASTYA. Q-NASTYA uses local gradient steps and different local and global stepsizes. Next, we show how to reduce compression variance in this setting as well. Finally, we prove the convergence results for the proposed methods and outline several settings in which they improve upon existing algorithms.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Sadiev et al. "Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences." Neural Information Processing Systems, 2024. doi:10.52202/079017-2685

Markdown

[Sadiev et al. "Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/sadiev2024neurips-don/) doi:10.52202/079017-2685

BibTeX

@inproceedings{sadiev2024neurips-don,
  title     = {{Don't Compress Gradients in Random Reshuffling: Compress Gradient Differences}},
  author    = {Sadiev, Abdurakhmon and Malinovsky, Grigory and Gorbunov, Eduard and Sokolov, Igor and Khaled, Ahmed and Burlachenko, Konstantin and Richtárik, Peter},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2685},
  url       = {https://mlanthology.org/neurips/2024/sadiev2024neurips-don/}
}