Continual Pre-Training of Large Language Models: How to Re-Warm Your Model?
Abstract
Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths within the first 50B tokens. Our results show that not warming up at all and keeping a constant learning rate gives the best performance for both downstream and upstream validation data.
Cite
Text
Gupta et al. "Continual Pre-Training of Large Language Models: How to Re-Warm Your Model?." ICML 2023 Workshops: ES-FoMO, 2023.Markdown
[Gupta et al. "Continual Pre-Training of Large Language Models: How to Re-Warm Your Model?." ICML 2023 Workshops: ES-FoMO, 2023.](https://mlanthology.org/icmlw/2023/gupta2023icmlw-continual/)BibTeX
@inproceedings{gupta2023icmlw-continual,
title = {{Continual Pre-Training of Large Language Models: How to Re-Warm Your Model?}},
author = {Gupta, Kshitij and Thérien, Benjamin and Ibrahim, Adam and Richter, Mats Leon and Anthony, Quentin Gregory and Belilovsky, Eugene and Rish, Irina and Lesort, Timothée},
booktitle = {ICML 2023 Workshops: ES-FoMO},
year = {2023},
url = {https://mlanthology.org/icmlw/2023/gupta2023icmlw-continual/}
}