Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping
Abstract
Recently, Transformer-based language models have demonstrated remarkable performance across many NLP domains. However, the unsupervised pre-training step of these models suffers from unbearable overall computational expenses. Current methods for accelerating the pre-training either rely on massive parallelism with advanced hardware or are not applicable to language models.
Cite
Text
Zhang and He. "Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping." Neural Information Processing Systems, 2020.Markdown
[Zhang and He. "Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/zhang2020neurips-accelerating/)BibTeX
@inproceedings{zhang2020neurips-accelerating,
title = {{Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping}},
author = {Zhang, Minjia and He, Yuxiong},
booktitle = {Neural Information Processing Systems},
year = {2020},
url = {https://mlanthology.org/neurips/2020/zhang2020neurips-accelerating/}
}