Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point
Abstract
In this paper, we explore the limits of Microsoft Floating Point (MSFP), a new class of datatypes developed for production cloud-scale inferencing on custom hardware. Through the co-evolution of hardware design and algorithms, MSFP achieves accuracy comparable to or better than industry standards Bfloat16 and INT8 at 3x and 4x lower cost, respectively. MSFP incurs negligible impact to accuracy (<1%), requires no changes to the model topology, and is integrated with a mature cloud production pipeline. MSFP supports various classes of deep learning models including CNNs, RNNs, and Transformers without modification. Finally, we characterize the accuracy and implementation of MSFP and demonstrate its efficacy on a number of production scenarios, including models that power major online scenarios such as web search, question-answering, and image classification.
Cite
Text
Rouhani et al. "Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point." Neural Information Processing Systems, 2020.Markdown
[Rouhani et al. "Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/rouhani2020neurips-pushing/)BibTeX
@inproceedings{rouhani2020neurips-pushing,
title = {{Pushing the Limits of Narrow Precision Inferencing at Cloud Scale with Microsoft Floating Point}},
author = {Rouhani, Bita Darvish and Lo, Daniel and Zhao, Ritchie and Liu, Ming and Fowers, Jeremy and Ovtcharov, Kalin and Vinogradsky, Anna and Massengill, Sarah and Yang, Lita and Bittner, Ray and Forin, Alessandro and Zhu, Haishan and Na, Taesik and Patel, Prerak and Che, Shuai and Koppaka, Lok Chand and Song, Xia and Som, Subhojit and Das, Kaustav and T, Saurabh and Reinhardt, Steve and Lanka, Sitaram and Chung, Eric and Burger, Doug},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/rouhani2020neurips-pushing/}
}