Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning
Abstract
We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $\varepsilon^{-2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.
Cite
Text
Colin et al. "Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning." Neural Information Processing Systems, 2019.Markdown
[Colin et al. "Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/colin2019neurips-theoretical/)BibTeX
@inproceedings{colin2019neurips-theoretical,
title = {{Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning}},
author = {Colin, Igor and Dos Santos, Ludovic and Scaman, Kevin},
booktitle = {Neural Information Processing Systems},
year = {2019},
pages = {12371-12380},
url = {https://mlanthology.org/neurips/2019/colin2019neurips-theoretical/}
}