Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Abstract

Conventional distributed Graph Neural Network (GNN) training relies either on inter-instance communication or periodic fallback to centralized training, both of which create overhead and constrain their scalability. In this work, we propose a streamlined framework for distributed GNN training that eliminates these costly operations, yielding improved scalability, convergence speed, and performance over state-of-the-art approaches. Our framework (1) comprises independent trainers that asynchronously learn local models from locally-available parts of the training graph, and (2) synchronize these local models only through periodic (time-based) model aggregation. Contrary to prevailing belief, our theoretical analysis shows that it is not essential to maximize the recovery of cross-instance node dependencies to achieve performance parity with centralized training. Instead, our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph to enhance data uniformity and minimize discrepancies in gradient and loss function across instances. Experiments on social and e-commerce networks with up to 1.3 billion edges show that our proposed framework achieves state-of-the-art performance and 2.31x speedup compared to the fastest baseline, despite using less training data.

Cite

Text

Zhu et al. "Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation." ICML 2023 Workshops: LLW, 2023.

Markdown

[Zhu et al. "Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation." ICML 2023 Workshops: LLW, 2023.](https://mlanthology.org/icmlw/2023/zhu2023icmlw-simplifying/)

BibTeX

@inproceedings{zhu2023icmlw-simplifying,
  title     = {{Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation}},
  author    = {Zhu, Jiong and Reganti, Aishwarya Naresh and Huang, Edward W and Dickens, Charles Andrew and Rao, Nikhil and Subbian, Karthik and Koutra, Danai},
  booktitle = {ICML 2023 Workshops: LLW},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/zhu2023icmlw-simplifying/}
}