Training Deep Neural Networks with 8-Bit Floating Point Numbers

Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, Kailash Gopalakrishnan

NeurIPS 2018 pp. 7675-7684

/neurips/2018/wang2018neurips-training/

Abstract

The state-of-the-art hardware platforms for training deep neural networks are moving from traditional single precision (32-bit) computations towards 16 bits of precision - in large part due to the high energy efficiency and smaller bit storage associated with using reduced-precision representations. However, unlike inference, training with numbers represented with less than 16 bits has been challenging due to the need to maintain fidelity of the gradient computations during back-propagation. Here we demonstrate, for the first time, the successful training of deep neural networks using 8-bit floating point numbers while fully maintaining the accuracy on a spectrum of deep learning models and datasets. In addition to reducing the data and computation precision to 8 bits, we also successfully reduce the arithmetic precision for additions (used in partial product accumulation and weight updates) from 32 bits to 16 bits through the introduction of a number of key ideas including chunk-based accumulation and floating point stochastic rounding. The use of these novel techniques lays the foundation for a new generation of hardware training platforms with the potential for 2-4 times improved throughput over today's systems.

PDF NeurIPS Semantic Scholar

Cite

Text

Wang et al. "Training Deep Neural Networks with 8-Bit Floating Point Numbers." Neural Information Processing Systems, 2018.

Markdown

[Wang et al. "Training Deep Neural Networks with 8-Bit Floating Point Numbers." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/wang2018neurips-training/)

BibTeX

@inproceedings{wang2018neurips-training,
  title     = {{Training Deep Neural Networks with 8-Bit Floating Point Numbers}},
  author    = {Wang, Naigang and Choi, Jungwook and Brand, Daniel and Chen, Chia-Yu and Gopalakrishnan, Kailash},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {7675-7684},
  url       = {https://mlanthology.org/neurips/2018/wang2018neurips-training/}
}