The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation

Abstract

Data-centric distillation, including data augmentation, selection, and mixing, offers a promising path to creating smaller, more efficient student Large Language Models (LLMs) that retain strong reasoning abilities. However, there still lacks a comprehensive benchmark to systematically assess the effect of each distillation approach. This paper introduces DC-CoT, the first data-centric benchmark that investigates data manipulation in chain-of-thought (CoT) distillation from method, model and data perspectives. Utilizing various teacher models (e.g., o4-mini, Gemini-Pro, Claude-3.5) and student architectures (e.g., 3B, 7B parameters), we rigorously evaluate the impact of these data manipulations on student model performance across multiple reasoning datasets, with a focus on in-distribution (IID) and out-of-distribution (OOD) generalization, and cross-domain transfer. Our findings aim to provide actionable insights and establish best practices for optimizing CoT distillation through data-centric techniques, ultimately facilitating the development of more accessible and capable reasoning models. The nonymous codebase can be accessed https://anonymous.4open.science/r/DC-COT-FF4C/

Cite

Text

Zhang et al. "The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-quest/)

BibTeX

@inproceedings{zhang2026iclr-quest,
  title     = {{The Quest for Efficient Reasoning: A Data-Centric Benchmark to CoT Distillation}},
  author    = {Zhang, Ruichen and Shahroz, Rana and Tan, Zhen and Li, Dawei and Wang, Song and Chen, Tianlong},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-quest/}
}