Efficient Training of Language Models Using Few-Shot Learning
Abstract
Large deep learning models have achieved state-of-the-art performance across various natural language processing (NLP) tasks and demonstrated remarkable few-shot learning performance. However, training them is often challenging and resource-intensive. In this paper, we study an efficient approach to train language models using few-shot learners. We show that, by leveraging the fast learning nature of few-shot learners, one can train language models efficiently in a stagewise manner. Our main insight is that stacking a good few-shot learner on a good small language model provides a good initializer for a larger language model. Using this insight and building upon progressive stacking approaches, we develop novel approaches for training such networks in a stagewise manner. Furthermore, we also provide a theoretical framework and accompanying empirical studies to support our insights, thereby creating a theoretical foundation for progressive stacking. Finally, we provide empirical results to demonstrate the effectiveness of our approach in reducing the training time of few-shot learners.
Cite
Text
Reddi et al. "Efficient Training of Language Models Using Few-Shot Learning." International Conference on Machine Learning, 2023.Markdown
[Reddi et al. "Efficient Training of Language Models Using Few-Shot Learning." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/jreddi2023icml-efficient/)BibTeX
@inproceedings{jreddi2023icml-efficient,
title = {{Efficient Training of Language Models Using Few-Shot Learning}},
author = {Reddi, Sashank J. and Miryoosefi, Sobhan and Karp, Stefani and Krishnan, Shankar and Kale, Satyen and Kim, Seungyeon and Kumar, Sanjiv},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {14553-14568},
volume = {202},
url = {https://mlanthology.org/icml/2023/jreddi2023icml-efficient/}
}