Distributed Adaptive Optimization with Divisible Communication

Xu, An; Bai, Yang

doi:10.1007/978-3-031-43418-1_39

Distributed Adaptive Optimization with Divisible Communication

An Xu, Yang Bai

ECML-PKDD 2023 pp. 654-670

doi:10.1007/978-3-031-43418-1_39 /ecmlpkdd/2023/xu2023ecmlpkdd-distributed/

Abstract

Synchronous distributed training can scale the training of deep neural networks on large-scale data, thus it has been widely adopted in large-scale applications. Because it often suffers from the communication bottleneck, many methods have been proposed to reduce the communication cost. However, these communication reduction methods often lead to poor performance for the adaptive optimizer, largely due to its non-linearity. To address this challenging issue, we propose a novel method to divide the communication into the foreground and background communication. The foreground communication is more informative but can be of low cost to achieve communication efficiency, while the background communication runs in the background and requires no synchronization time. We use Adam as the base optimizer and achieve $\times 1024$ × 1024 foreground compression ratio on CIFAR-10, $\times 128$ × 128 on non-iid CIFAR-10, $\times 64$ × 64 on ImageNet image classification tasks, and $\times 128$ × 128 on WMT’16 EN-DE machine translation task with comparable performance, which leads to $\times 7$ × 7 , $\times 6.4$ × 6.4 , $\times 3.5$ × 3.5 , and $\times 7$ × 7 training speedup, respectively. Moreover, we provide rigorous theoretical analysis to prove that our method obtains the same convergence rate as Adam and achieves linear speedup regarding the number of workers.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Xu and Bai. "Distributed Adaptive Optimization with Divisible Communication." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43418-1_39

Markdown

[Xu and Bai. "Distributed Adaptive Optimization with Divisible Communication." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/xu2023ecmlpkdd-distributed/) doi:10.1007/978-3-031-43418-1_39

BibTeX

@inproceedings{xu2023ecmlpkdd-distributed,
  title     = {{Distributed Adaptive Optimization with Divisible Communication}},
  author    = {Xu, An and Bai, Yang},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {654-670},
  doi       = {10.1007/978-3-031-43418-1_39},
  url       = {https://mlanthology.org/ecmlpkdd/2023/xu2023ecmlpkdd-distributed/}
}