Survey on Efficient Training of Large Neural Networks

Abstract

Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models don’t fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations.

Cite

Text

Gusak et al. "Survey on Efficient Training of Large Neural Networks." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/769

Markdown

[Gusak et al. "Survey on Efficient Training of Large Neural Networks." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/gusak2022ijcai-survey/) doi:10.24963/IJCAI.2022/769

BibTeX

@inproceedings{gusak2022ijcai-survey,
  title     = {{Survey on Efficient Training of Large Neural Networks}},
  author    = {Gusak, Julia and Cherniuk, Daria and Shilova, Alena and Katrutsa, Alexandr and Bershatsky, Daniel and Zhao, Xunyi and Eyraud-Dubois, Lionel and Shliazhko, Oleh and Dimitrov, Denis and Oseledets, Ivan V. and Beaumont, Olivier},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {5494-5501},
  doi       = {10.24963/IJCAI.2022/769},
  url       = {https://mlanthology.org/ijcai/2022/gusak2022ijcai-survey/}
}