Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping

Abstract

Transformer models are widely used in AI applications such as Natural Language Processing (NLP), Computer Vision (CV), etc. However, enormous computation workload be-comes an obstacle to train large transformer models efficiently. Recently, some methods focus on reducing the computation workload during the training by skipping some layers. How-ever, these methods use simple probability distribution and coarse-grained probability calculation, which significantly affect the model accuracy. To address the issue, in this paper we propose a novel method to accelerate training—Sensitivity-Based Layer Dropping (SBLD). SBLD uses lay-er-wise sensitivity data to switch on/off transformer layers in proper order to keep high accuracy. Besides, we adjust the probability of skipping transformer layers with a scheduler to accelerate training speed and get faster convergence. Our results show that SBLD solves the accuracy drop issue com-pared with prior layer dropping methods. Our SBLD method can decrease end-to-end training time by 19.67% during training of GPT-3 Medium model, the same time increasing the accuracy by 1.65% w.r.t. baseline. Furthermore, for SwinV2-L model the obtained Top-1 and Top-5 accuracies are also higher vs. the baseline. Thus, the proposed method is efficient and practical to improve the large transformer model training.

Cite

Text

Zeng et al. "Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I9.26321

Markdown

[Zeng et al. "Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/zeng2023aaai-acceleration/) doi:10.1609/AAAI.V37I9.26321

BibTeX

@inproceedings{zeng2023aaai-acceleration,
  title     = {{Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping}},
  author    = {Zeng, Yujie and He, Wenlong and Vasyltsov, Ihor V. and Pang, Jiali and Chen, Lin},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {11156-11163},
  doi       = {10.1609/AAAI.V37I9.26321},
  url       = {https://mlanthology.org/aaai/2023/zeng2023aaai-acceleration/}
}