Spotlight: Optimizing Device Placement for Training Deep Neural Networks

Abstract

Training deep neural networks (DNNs) requires an increasing amount of computation resources, and it becomes typical to use a mixture of GPU and CPU devices. Due to the heterogeneity of these devices, a recent challenge is how each operation in a neural network can be optimally placed on these devices, so that the training process can take the shortest amount of time possible. The current state-of-the-art solution uses reinforcement learning based on the policy gradient method, and it suffers from suboptimal training times. In this paper, we propose Spotlight, a new reinforcement learning algorithm based on proximal policy optimization, designed specifically for finding an optimal device placement for training DNNs. The design of our new algorithm relies upon a new model of the device placement problem: by modeling it as a Markov decision process with multiple stages, we are able to prove that Spotlight achieves a theoretical guarantee on performance improvements. We have implemented Spotlight in the CIFAR-10 benchmark and deployed it on the Google Cloud platform. Extensive experiments have demonstrated that the training time with placements recommended by Spotlight is 60.9% of that recommended by the policy gradient method.

Cite

Text

Gao et al. "Spotlight: Optimizing Device Placement for Training Deep Neural Networks." International Conference on Machine Learning, 2018.

Markdown

[Gao et al. "Spotlight: Optimizing Device Placement for Training Deep Neural Networks." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/gao2018icml-spotlight/)

BibTeX

@inproceedings{gao2018icml-spotlight,
  title     = {{Spotlight: Optimizing Device Placement for Training Deep Neural Networks}},
  author    = {Gao, Yuanxiang and Chen, Li and Li, Baochun},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {1676-1684},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/gao2018icml-spotlight/}
}