PipeQS: Pipeline-Based Adaptive Quantization and Staleness-Aware Distributed GNN Training System

Abstract

Graph Neural Networks (GNNs) have emerged as the state-of-the-art method for graph-based learning tasks. However, training GNNs at scale remains challenging, limiting the exploration of more sophisticated GNN architectures and their application to large real-world graphs. In distributed GNN training, communication overhead and waiting times have become major performance bottlenecks. To address these challenges, we propose PipeQS, an adaptive quantization and staleness-aware pipeline distributed training system for GNNs. PipeQS dynamically adjusts the bit-width of message quantization and manages staleness to reduce both communication overhead and communication waiting time. By detecting pipeline bottlenecks caused by synchronization and utilizing cached communication to bypass message delays, PipeQS significantly improves training efficiency. Experimental results validate the effectiveness of PipeQS, showing up to an 8.3 $ \times $ × improvement in throughput while maintaining full-graph accuracy. Furthermore, our theoretical analysis demonstrates fast convergence at a rate of $O(T^{ - \frac{1}{2}})$ O ( T - 1 2 ) , where T is the total number of training epochs. PipeQS achieves a well-balanced trade-off between training speed and accuracy, significantly reducing training time without compromising performance. The code is available at https://github.com/suupahako/PipeQS-code

Cite

Text

Wu et al. "PipeQS: Pipeline-Based Adaptive Quantization and Staleness-Aware Distributed GNN Training System." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-05981-9_31

Markdown

[Wu et al. "PipeQS: Pipeline-Based Adaptive Quantization and Staleness-Aware Distributed GNN Training System." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/wu2025ecmlpkdd-pipeqs/) doi:10.1007/978-3-032-05981-9_31

BibTeX

@inproceedings{wu2025ecmlpkdd-pipeqs,
  title     = {{PipeQS: Pipeline-Based Adaptive Quantization and Staleness-Aware Distributed GNN Training System}},
  author    = {Wu, Donghang and Shen, Lian and Jiang, Changzhi and Li, Yanhao and Liu, Xiangrong},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {527-543},
  doi       = {10.1007/978-3-032-05981-9_31},
  url       = {https://mlanthology.org/ecmlpkdd/2025/wu2025ecmlpkdd-pipeqs/}
}