Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models
Abstract
In this work, we consider whether pretraining on a pruned high-quality subset of a large-scale text dataset can improve LLM performance. While existing work has shown that pruning based on the perplexity of a larger model can yield high-quality data, we investigate whether smaller models can be used for perplexity-based pruning and how pruning is affected by the domain composition of the data being pruned. We demonstrate that for multiple dataset compositions, perplexity-based pruning of pretraining data can \emph{significantly} improve downstream task performance: pruning based on perplexities computed with a 125 million parameter model improves the average accuracy of downstream tasks of a 3 billion parameter model by up to 1.35\% and achieves up to a $1.36\times$ reduction in pretraining steps to reach commensurate baseline performance.
Cite
Text
Ankner et al. "Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models." ICLR 2024 Workshops: ME-FoMo, 2024.Markdown
[Ankner et al. "Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models." ICLR 2024 Workshops: ME-FoMo, 2024.](https://mlanthology.org/iclrw/2024/ankner2024iclrw-perplexed/)BibTeX
@inproceedings{ankner2024iclrw-perplexed,
title = {{Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models}},
author = {Ankner, Zachary and Blakeney, Cody and Sreenivasan, Kartik and Marion, Max and Leavitt, Matthew L and Paul, Mansheej},
booktitle = {ICLR 2024 Workshops: ME-FoMo},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/ankner2024iclrw-perplexed/}
}