Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping

Abstract

Recently, Transformer-based language models have demonstrated remarkable performance across many NLP domains. However, the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current methods for accelerating the pre-training either rely on massive parallelism with advanced hardware or are not applicable to language models.

Cite

Text

Zhang and He. "Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping." Neural Information Processing Systems, 2020.

Markdown

[Zhang and He. "Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/zhang2020neurips-accelerating/)

BibTeX

@inproceedings{zhang2020neurips-accelerating,
  title     = {{Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping}},
  author    = {Zhang, Minjia and He, Yuxiong},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/zhang2020neurips-accelerating/}
}