Exploring Asynchronism in SWARM Parallelism

Abstract

SWARM parallelism is a framework that enhances pipeline parallelism in distributed training by incorporating fault tolerance. However, the synchronous nature of this approach introduces inefficiencies that can hinder performance and scalability. We analyze these inefficiencies and propose an asynchronous modification to the framework that enables nodes to perform local updates and periodically average their states. Our results demonstrate that this modified asynchronous SWARM achieves higher throughput without sacrificing model convergence.

Cite

Text

Zuo et al. "Exploring Asynchronism in SWARM Parallelism." ICLR 2025 Workshops: MCDC, 2025.

Markdown

[Zuo et al. "Exploring Asynchronism in SWARM Parallelism." ICLR 2025 Workshops: MCDC, 2025.](https://mlanthology.org/iclrw/2025/zuo2025iclrw-exploring/)

BibTeX

@inproceedings{zuo2025iclrw-exploring,
  title     = {{Exploring Asynchronism in SWARM Parallelism}},
  author    = {Zuo, Yan and Avraham, Gil and Ajanthan, Thalaiyasingam and Ramasinghe, Sameera and Long, Alexander},
  booktitle = {ICLR 2025 Workshops: MCDC},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/zuo2025iclrw-exploring/}
}