Reducing Underflow in Mixed Precision Training by Gradient Scaling

Abstract

By leveraging the half-precision floating-point format (FP16) well supported by recent GPUs, mixed precision training (MPT) enables us to train larger models under the same or even smaller budget. However, due to the limited representation range of FP16, gradients can often experience severe underflow problems that hinder backpropagation and degrade model accuracy. MPT adopts loss scaling, which scales up the loss value just before backpropagation starts, to mitigate underflow by enlarging the magnitude of gradients. Unfortunately, scaling once is insufficient: gradients from distinct layers can each have different data distributions and require non-uniform scaling. Heuristics and hyperparameter tuning are needed to minimize these side-effects on loss scaling. We propose gradient scaling, a novel method that analytically calculates the appropriate scale for each gradient on-the-fly. It addresses underflow effectively without numerical problems like overflow and the need for tedious hyperparameter tuning. Experiments on a variety of networks and tasks show that gradient scaling can improve accuracy and reduce overall training effort compared with the state-of-the-art MPT.

Cite

Text

Zhao et al. "Reducing Underflow in Mixed Precision Training by Gradient Scaling." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/404

Markdown

[Zhao et al. "Reducing Underflow in Mixed Precision Training by Gradient Scaling." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/zhao2020ijcai-reducing/) doi:10.24963/IJCAI.2020/404

BibTeX

@inproceedings{zhao2020ijcai-reducing,
  title     = {{Reducing Underflow in Mixed Precision Training by Gradient Scaling}},
  author    = {Zhao, Ruizhe and Vogel, Brian and Ahmed, Tanvir and Luk, Wayne},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {2922-2928},
  doi       = {10.24963/IJCAI.2020/404},
  url       = {https://mlanthology.org/ijcai/2020/zhao2020ijcai-reducing/}
}