Efficient Algorithms for Device Placement of DNN Graph Operators

Abstract

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of Domain Specific Architectures (DSAs) being offered as hardware accelerators in addition to CPUs. These trends necessitate distributing the workload across multiple devices. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices.

Cite

Text

Tarnawski et al. "Efficient Algorithms for Device Placement of DNN Graph Operators." Neural Information Processing Systems, 2020.

Markdown

[Tarnawski et al. "Efficient Algorithms for Device Placement of DNN Graph Operators." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/tarnawski2020neurips-efficient/)

BibTeX

@inproceedings{tarnawski2020neurips-efficient,
  title     = {{Efficient Algorithms for Device Placement of DNN Graph Operators}},
  author    = {Tarnawski, Jakub M and Phanishayee, Amar and Devanur, Nikhil and Mahajan, Divya and Paravecino, Fanny Nina},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/tarnawski2020neurips-efficient/}
}