The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory Before In-Context Learning
Abstract
We study how down-scaling large language model (LLM) size impacts LLM capabilities. We begin by measuring the effects of weight pruning – a popular technique for reducing model size – on the two abilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in context. Surprisingly, we find that existing pruning techniques affect these two abilities of LLMs differently. For example, pruning more than 30% of weights significantly decreases an LLM’s ability to recall facts presented during pre-training. Yet pruning 60-70% of weights largely preserves an LLM’s ability to process information in-context, ranging from retrieving answers based on information presented in context to learning parameterized functions such as a linear classifier based on a few examples. Moderate pruning impairs LLM’s ability to recall facts learnt from pre-training. However, its effect on model’s ability to process information presented in context is much less pronounced. The said disparate effects similarly arise when replacing the original model with a smaller dense one with reduced width and depth. This similarity suggests that model size reduction in general underpins the said disparity.
Cite
Text
Jin et al. "The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory Before In-Context Learning." International Conference on Learning Representations, 2024.Markdown
[Jin et al. "The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory Before In-Context Learning." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/jin2024iclr-cost/)BibTeX
@inproceedings{jin2024iclr-cost,
title = {{The Cost of Scaling Down Large Language Models: Reducing Model Size Affects Memory Before In-Context Learning}},
author = {Jin, Tian and Clement, Nolan and Dong, Xin and Nagarajan, Vaishnavh and Carbin, Michael and Ragan-Kelley, Jonathan and Dziugaite, Gintare Karolina},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/jin2024iclr-cost/}
}