Cross Model Parallelism for Faster Bidirectional Training of Large Convolutional Neural Networks

Abstract

Large convolutional neural networks (CNNs) have been successful in data mining tasks, but it is hard to train these large-scale models. Model parallelism (MP) places a large CNN to several workers (GPUs) to fit in the memory, but its computation efficiency is low as only one worker is activated at a time and the other workers are idle during training. Pipeline model parallelism (PMP) improves model parallelism by pipelining mini-batches, checkpointing some intermediate activations, and using delayed backward error gradients. But all these techniques have certain limitations, add to the computation cost, and may deteriorate the model performance. To address these important issues and improve the efficiency of model parallelism, we propose a novel cross model parallelism (CMP) method without requiring additional computation overheads and jeopardizing the performance. In cross model parallelism, we reversely place two models to workers and bidirectionally train them at the same time to improve the training throughput. A novel averaging method to synchronize the two models is also proposed in cross model parallelism. Theoretical analysis shows that cross model parallelism converges as fast as model parallelism regarding training epochs. Extensive deep learning experimental results show that our proposed cross model parallelism can achieve a speedup of up to $\times 1.5$ × 1.5 compared with model parallelism regarding training time.

Cite

Text

Xu and Bai. "Cross Model Parallelism for Faster Bidirectional Training of Large Convolutional Neural Networks." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023. doi:10.1007/978-3-031-43418-1_38

Markdown

[Xu and Bai. "Cross Model Parallelism for Faster Bidirectional Training of Large Convolutional Neural Networks." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2023.](https://mlanthology.org/ecmlpkdd/2023/xu2023ecmlpkdd-cross/) doi:10.1007/978-3-031-43418-1_38

BibTeX

@inproceedings{xu2023ecmlpkdd-cross,
  title     = {{Cross Model Parallelism for Faster Bidirectional Training of Large Convolutional Neural Networks}},
  author    = {Xu, An and Bai, Yang},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2023},
  pages     = {637-653},
  doi       = {10.1007/978-3-031-43418-1_38},
  url       = {https://mlanthology.org/ecmlpkdd/2023/xu2023ecmlpkdd-cross/}
}