High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
Abstract
A growing number of machine learning scenarios rely on knowledge distillation where one uses the output of a surrogate model as labels to supervise the training of a target model. In this work, we provide a sharp characterization of this process for ridgeless, high-dimensional regression, under two settings: *(i)* model shift, where the surrogate model is arbitrary, and *(ii)* distribution shift, where the surrogate model is the solution of empirical risk minimization with out-of-distribution data. In both cases, we characterize the precise risk of the target model through non-asymptotic bounds in terms of sample size and data distribution under mild conditions. As a consequence, we identify the form of the optimal surrogate model, which reveals the benefits and limitations of discarding weak features in a data-dependent fashion. In the context of weak-to-strong (W2S) generalization, this has the interpretation that *(i)* W2S training, with the surrogate as the weak model, can provably outperform training with strong labels under the same data budget, but *(ii)* it is unable to improve the data scaling law. We validate our results on numerical experiments both on ridgeless regression and on neural network architectures.
Cite
Text
Ildiz et al. "High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws." International Conference on Learning Representations, 2025.Markdown
[Ildiz et al. "High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/ildiz2025iclr-highdimensional/)BibTeX
@inproceedings{ildiz2025iclr-highdimensional,
title = {{High-Dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws}},
author = {Ildiz, Muhammed Emrullah and Gozeten, Halil Alperen and Taga, Ege Onur and Mondelli, Marco and Oymak, Samet},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/ildiz2025iclr-highdimensional/}
}