Sancus: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks (Extended Abstract)
Abstract
Graph neural networks (GNNs) have emerged due to their success at modeling graph data. Yet, it is challenging for GNNs to efficiently scale to large graphs. Thus, distributed GNNs come into play. To avoid communication caused by expensive data movement between workers, we propose SANCUS, a staleness-aware communication-avoiding decentralized GNN system. By introducing a set of novel bounded embedding staleness metrics and adaptively skipping broadcasts, SANCUS abstracts decentralized GNN processing as sequential matrix multiplication and uses historical embeddings via cache. Theoretically, we show bounded approximation errors of embeddings and gradients with convergence guarantee. Empirically, we evaluate SANCUS with common GNN models via different system setups on large-scale benchmark datasets. Compared to SOTA works, SANCUS can avoid up to 74% communication with at least 1:86_ faster throughput on average without accuracy loss.
Cite
Text
Peng et al. "Sancus: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks (Extended Abstract)." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/724Markdown
[Peng et al. "Sancus: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks (Extended Abstract)." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/peng2023ijcai-sancus/) doi:10.24963/IJCAI.2023/724BibTeX
@inproceedings{peng2023ijcai-sancus,
title = {{Sancus: Staleness-Aware Communication-Avoiding Full-Graph Decentralized Training in Large-Scale Graph Neural Networks (Extended Abstract)}},
author = {Peng, Jingshu and Chen, Zhao and Shao, Yingxia and Shen, Yanyan and Chen, Lei and Cao, Jiannong},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2023},
pages = {6480-6485},
doi = {10.24963/IJCAI.2023/724},
url = {https://mlanthology.org/ijcai/2023/peng2023ijcai-sancus/}
}