Model Recycling: Model Component Reuse to Promote In-Context Learning

Abstract

In-context learning (ICL) is a behavior seen in transformer-based models where, during inference, the model is able to leverage examples of a novel task in order to perform accurately on that task. Here we study the role of different model components on ICL behavior via model component recycling. Previous work has found a plateau in the training loss before models begin to learn a general-purpose ICL solution. We explore a model recycling experiment related to ICL, investigating whether recycling model components can reduce the early plateau in the training loss and whether certain components impact ICL more than others. We find that transferring embeddings and early layers of the transformer from a trained model to an untrained model results in the elimination of the plateau seen in standard model training. Moreover, transferring only later layers of the transformer does not result in significant plateau reductions, indicating the importance of the embeddings and early transformer layers in ICL performance.

Cite

Text

Smith et al. "Model Recycling: Model Component Reuse to Promote In-Context Learning." NeurIPS 2024 Workshops: SciForDL, 2024.

Markdown

[Smith et al. "Model Recycling: Model Component Reuse to Promote In-Context Learning." NeurIPS 2024 Workshops: SciForDL, 2024.](https://mlanthology.org/neuripsw/2024/smith2024neuripsw-model/)

BibTeX

@inproceedings{smith2024neuripsw-model,
  title     = {{Model Recycling: Model Component Reuse to Promote In-Context Learning}},
  author    = {Smith, Lindsay M. and Goddard, Chase and Ngampruetikorn, Vudtiwat and Schwab, David J.},
  booktitle = {NeurIPS 2024 Workshops: SciForDL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/smith2024neuripsw-model/}
}