Efficient Training of Language Models Using Few-Shot Learning

Abstract

Large deep learning models have achieved state-of-the-art performance across various natural language processing (NLP) tasks and demonstrated remarkable few-shot learning performance. However, training them is often challenging and resource-intensive. In this paper, we study an efficient approach to train language models using few-shot learners. We show that, by leveraging the fast learning nature of few-shot learners, one can train language models efficiently in a stagewise manner. Our main insight is that stacking a good few-shot learner on a good small language model provides a good initializer for a larger language model. Using this insight and building upon progressive stacking approaches, we develop novel approaches for training such networks in a stagewise manner. Furthermore, we also provide a theoretical framework and accompanying empirical studies to support our insights, thereby creating a theoretical foundation for progressive stacking. Finally, we provide empirical results to demonstrate the effectiveness of our approach in reducing the training time of few-shot learners.

Cite

Text

Reddi et al. "Efficient Training of Language Models Using Few-Shot Learning." International Conference on Machine Learning, 2023.

Markdown

[Reddi et al. "Efficient Training of Language Models Using Few-Shot Learning." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/jreddi2023icml-efficient/)

BibTeX

@inproceedings{jreddi2023icml-efficient,
  title     = {{Efficient Training of Language Models Using Few-Shot Learning}},
  author    = {Reddi, Sashank J. and Miryoosefi, Sobhan and Karp, Stefani and Krishnan, Shankar and Kale, Satyen and Kim, Seungyeon and Kumar, Sanjiv},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {14553-14568},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/jreddi2023icml-efficient/}
}