Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
Abstract
Self-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights data based on this verification, and distills the filtered data. Despite several empirical successes, a fundamental understanding is still lacking. In this work, we initiate a comprehensive, modular and controlled study on LLM self-improvement. We provide a mathematical formulation for self-improvement, which is largely governed by a quantity which we formalize as the **generation-verification gap**. Through experiments with various model families and tasks, we discover a scaling phenomenon of self-improvement -- a variant of the generation-verification gap scales monotonically with the model pre-training flops. We also examine when self-improvement is possible, an iterative self-improvement procedure, and ways to improve its performance. Our findings not only advance understanding of LLM self-improvement with practical implications, but also open numerous avenues for future research into its capabilities and boundaries.
Cite
Text
Song et al. "Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models." International Conference on Learning Representations, 2025.Markdown
[Song et al. "Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/song2025iclr-mind/)BibTeX
@inproceedings{song2025iclr-mind,
title = {{Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models}},
author = {Song, Yuda and Zhang, Hanlin and Eisenach, Carson and Kakade, Sham M. and Foster, Dean and Ghai, Udaya},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/song2025iclr-mind/}
}