On Efficient Distillation from LLMs to SLMs
Abstract
Finetuning small language models (SLMs) on data generated by large language models (LLMs), a form of knowledge distillation, has recently been demonstrated to lead to significantly enhanced capabilities of small models across various domains (e.g., mathematical reasoning). However, current approaches typically require synthesizing a large number of new examples ($>100\textrm{K}$), which increases the resources and training time needed for finetuning. To address this issue, we investigate principles for making the distillation process more efficient by reducing the amount of synthetic data required. Specifically, we explore (i) incorporating SLM's feedback into the LLM's data generation process and (ii) including LLM's rationales (i.e., step-by-step solutions) in the distilled data. In our experiments using the Mistral7B model as the SLM on math reasoning tasks (GSM8K, MATH), we find that both feedback and rationales can help make finetuning with distillation more efficient (by requiring up to $\sim2\text{x}$ less synthetic data).
Cite
Text
Jazbec et al. "On Efficient Distillation from LLMs to SLMs." NeurIPS 2024 Workshops: FITML, 2024.Markdown
[Jazbec et al. "On Efficient Distillation from LLMs to SLMs." NeurIPS 2024 Workshops: FITML, 2024.](https://mlanthology.org/neuripsw/2024/jazbec2024neuripsw-efficient/)BibTeX
@inproceedings{jazbec2024neuripsw-efficient,
title = {{On Efficient Distillation from LLMs to SLMs}},
author = {Jazbec, Metod and Xia, Menglin and Mallick, Ankur and Madrigal, Daniel and Han, Dongge and Kessler, Samuel and Rühle, Victor},
booktitle = {NeurIPS 2024 Workshops: FITML},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/jazbec2024neuripsw-efficient/}
}