Does Weak-to-Strong Generalization Happen Under Spurious Correlations?

Abstract

We initiate a unified theoretical and algorithmic study of a key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures? We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction $\eta_\ell$, and (ii) a group-imbalanced unlabeled set pseudolabeled by the teacher with a minority group of fraction $\eta_u$. Theoretically, a precise characterization of W2S gain at the proportional asymptotic limit shows that W2S always happens with sufficient pseudolabels when $\eta_u = \eta_\ell$ but may fail when $\eta_u \ne \eta_\ell$, where W2S gain diminishes as $(\eta_u - \eta_\ell)^2$ increases. Our theory is corroborated by extensive experiments on various spurious correlation benchmarks and teacher-student pairs. To boost W2S performance upon failures, we further propose a simple, effective algorithmic remedy that retrains the strong student on its high-confidence data subset after W2S fine-tuning. Our algorithm is group-label-free and achieves consistent, substantial improvements over vanilla W2S fine-tuning.

Cite

Text

Liu et al. "Does Weak-to-Strong Generalization Happen Under Spurious Correlations?." International Conference on Learning Representations, 2026.

Markdown

[Liu et al. "Does Weak-to-Strong Generalization Happen Under Spurious Correlations?." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-weaktostrong/)

BibTeX

@inproceedings{liu2026iclr-weaktostrong,
  title     = {{Does Weak-to-Strong Generalization Happen Under Spurious Correlations?}},
  author    = {Liu, Chenruo and Dong, Yijun and Lei, Qi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/liu2026iclr-weaktostrong/}
}