Training and Inference of Large Language Models Using 8-Bit Floating Point

Sergio P. Perez, Yan Zhang, James Briggs, Charlie Blake, Josh Levy-Kramer, Paul Balanca, Carlo Luschi, Stephen Barlow, Andrew W Fitzgibbon

NeurIPSW 2023

/neuripsw/2023/perez2023neuripsw-training/

Abstract

FP8 formats are gaining popularity to boost the computational efficiency for training and inference of large deep learning models. Their main challenge is that a careful choice of scaling is needed to prevent degradation due to the reduced dynamic range compared to higher-precision formats. Although there exists ample literature about selecting such scalings for INT formats, this critical aspect has yet to be addressed for FP8. This paper presents a methodology to select the scalings for FP8 linear layers, based on dynamically updating per-tensor scales for the weights, gradients and activations. We apply this methodology to train and validate large language models of the type of GPT and Llama 2 using FP8, for model sizes ranging from 111M to 70B. To facilitate the understanding of the FP8 dynamics, our results are accompanied by plots of the per-tensor scale distribution for weights, activations and gradients during both training and inference.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Perez et al. "Training and Inference of Large Language Models Using 8-Bit Floating Point." NeurIPS 2023 Workshops: WANT, 2023.

Markdown

[Perez et al. "Training and Inference of Large Language Models Using 8-Bit Floating Point." NeurIPS 2023 Workshops: WANT, 2023.](https://mlanthology.org/neuripsw/2023/perez2023neuripsw-training/)

BibTeX

@inproceedings{perez2023neuripsw-training,
  title     = {{Training and Inference of Large Language Models Using 8-Bit Floating Point}},
  author    = {Perez, Sergio P. and Zhang, Yan and Briggs, James and Blake, Charlie and Levy-Kramer, Josh and Balanca, Paul and Luschi, Carlo and Barlow, Stephen and Fitzgibbon, Andrew W},
  booktitle = {NeurIPS 2023 Workshops: WANT},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/perez2023neuripsw-training/}
}