SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training Using Gradient Similarity Measurement

Abstract

Large scale training requires massive parallelism to finish the training within a reasonable amount of time. To support massive parallelism, large batch training is the key enabler but often at the cost of generalization performance. Existing works explore adaptive batching or hand-tuned static large batching, in order to strike a balance between the computational efficiency and the performance. However, these methods can provide only coarse-grained adaption (e.g., at a epoch level) due to the intrinsic expensive calculation or hand tuning requirements. In this paper, we propose a fully automated and lightweight adaptive batching methodology to enable fine-grained batch size adaption (e.g., at a mini-batch level) that can achieve state-of-the-art performance with record breaking batch sizes. The core component of our method is a lightweight yet efficient representation of the critical gradient noise information. We open-source the proposed methodology by providing a plugin tool that supports mainstream machine learning frameworks. Extensive evaluations on popular benchmarks (e.g., CIFAR10, ImageNet, and BERT-Large) demonstrate that the proposed methodology outperforms state-of-the-art methodologies using adaptive batching approaches or hand-tuned static strategies in both performance and batch size. Particularly, we achieve a new state-of-the-art batch size of 78k in BERT-Large pretraining with SQuAD score 90.69 compared to 90.58 reported in previous state-of-the-art with 59k batch size.

Cite

Text

Qin et al. "SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training Using Gradient Similarity Measurement." Neural Information Processing Systems, 2021.

Markdown

[Qin et al. "SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training Using Gradient Similarity Measurement." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/qin2021neurips-simigrad/)

BibTeX

@inproceedings{qin2021neurips-simigrad,
  title     = {{SimiGrad: Fine-Grained Adaptive Batching for Large Scale Training Using Gradient Similarity Measurement}},
  author    = {Qin, Heyang and Rajbhandari, Samyam and Ruwase, Olatunji and Yan, Feng and Yang, Lei and He, Yuxiong},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/qin2021neurips-simigrad/}
}