Distilling System 2 into System 1
Abstract
Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought \citep{CoT}, many such {\em System 2} techniques have been proposed such as Rephrase and Respond \citep{RaR}, System 2 Attention \citep{S2A} and Branch-Solve-Merge \citep{BSM}. In this work we investigate self-supervised methods to ``compile'' (distill) higher quality outputs from System 2 techniques back into LLM generations {\em without} intermediate reasoning token sequences, as this reasoning has been distilled into {\em System 1}. We show that several such techniques can be successfully distilled, resulting in improved results compared to the original System 1 performance, and with less inference cost than System 2. We posit that System 2 distillation will be an important feature of future continually learning AI systems, enabling them to focus System 2 capabilities on the reasoning tasks that they cannot yet do well.
Cite
Text
Yu et al. "Distilling System 2 into System 1." NeurIPS 2024 Workshops: Sys2-Reasoning, 2024.Markdown
[Yu et al. "Distilling System 2 into System 1." NeurIPS 2024 Workshops: Sys2-Reasoning, 2024.](https://mlanthology.org/neuripsw/2024/yu2024neuripsw-distilling/)BibTeX
@inproceedings{yu2024neuripsw-distilling,
title = {{Distilling System 2 into System 1}},
author = {Yu, Ping and Xu, Jing and Weston, Jason E and Kulikov, Ilia},
booktitle = {NeurIPS 2024 Workshops: Sys2-Reasoning},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/yu2024neuripsw-distilling/}
}