Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation

Jiong Zhu, Aishwarya Naresh Reganti, Edward W Huang, Charles Andrew Dickens, Nikhil Rao, Karthik Subbian, Danai Koutra

ICMLW 2023

/icmlw/2023/zhu2023icmlw-simplifying/

Abstract

Conventional distributed Graph Neural Network (GNN) training relies either on inter-instance communication or periodic fallback to centralized training, both of which create overhead and constrain their scalability. In this work, we propose a streamlined framework for distributed GNN training that eliminates these costly operations, yielding improved scalability, convergence speed, and performance over state-of-the-art approaches. Our framework (1) comprises independent trainers that asynchronously learn local models from locally-available parts of the training graph, and (2) synchronize these local models only through periodic (time-based) model aggregation. Contrary to prevailing belief, our theoretical analysis shows that it is not essential to maximize the recovery of cross-instance node dependencies to achieve performance parity with centralized training. Instead, our framework leverages randomized assignment of nodes or super-nodes (i.e., collections of original nodes) to partition the training graph to enhance data uniformity and minimize discrepancies in gradient and loss function across instances. Experiments on social and e-commerce networks with up to 1.3 billion edges show that our proposed framework achieves state-of-the-art performance and 2.31x speedup compared to the fastest baseline, despite using less training data.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Zhu et al. "Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation." ICML 2023 Workshops: LLW, 2023.

Markdown

[Zhu et al. "Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation." ICML 2023 Workshops: LLW, 2023.](https://mlanthology.org/icmlw/2023/zhu2023icmlw-simplifying/)

BibTeX

@inproceedings{zhu2023icmlw-simplifying,
  title     = {{Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation}},
  author    = {Zhu, Jiong and Reganti, Aishwarya Naresh and Huang, Edward W and Dickens, Charles Andrew and Rao, Nikhil and Subbian, Karthik and Koutra, Danai},
  booktitle = {ICML 2023 Workshops: LLW},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/zhu2023icmlw-simplifying/}
}