Accelerating Transformer Pre-Training with 2:4 Sparsity
Abstract
Training large transformers is slow, but recent innovations on GPU architecture give us an advantage. NVIDIA Ampere GPUs can execute a fine-grained 2:4 sparse matrix multiplication twice as fast as its dense equivalent. In the light of this property, we comprehensively investigate the feasibility of accelerating feed-forward networks (FFNs) of transformers in pre-training. First, we define a “flip rate” to monitor the stability of a 2:4 training process. Utilizing this metric, we propose three techniques to preserve accuracy: to modify the sparse-refined straight-through estimator by applying the masked decay term on gradients, to determine a feasible decay factor in warm-up stage, and to enhance the model’s quality by a dense fine-tuning procedure near the end of pre-training. Besides, we devise two techniques to practically accelerate training: to calculate transposable 2:4 masks by convolution, and to accelerate gated activation functions by reducing GPU L2 cache miss. Experiments show that our 2:4 sparse training algorithm achieves similar convergence to dense training algorithms on several transformer pre-training tasks, while actual acceleration can be observed on different shapes of transformer block apparently. Our toolkit is available at https://github.com/huyz2023/2by4-pretrain.
Cite
Text
Hu et al. "Accelerating Transformer Pre-Training with 2:4 Sparsity." International Conference on Machine Learning, 2024.Markdown
[Hu et al. "Accelerating Transformer Pre-Training with 2:4 Sparsity." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/hu2024icml-accelerating/)BibTeX
@inproceedings{hu2024icml-accelerating,
title = {{Accelerating Transformer Pre-Training with 2:4 Sparsity}},
author = {Hu, Yuezhou and Zhao, Kang and Huang, Weiyu and Chen, Jianfei and Zhu, Jun},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {19531-19543},
volume = {235},
url = {https://mlanthology.org/icml/2024/hu2024icml-accelerating/}
}