Anchored Supervised Fine-Tuning

Zhu, He; Su, Junyou; Lai, Peng; Ma, Ren; Zhang, Wenjia; Yang, Linyi; Chen, Guanhua

Anchored Supervised Fine-Tuning

He Zhu, Junyou Su, Peng Lai, Ren Ma, Wenjia Zhang, Linyi Yang, Guanhua Chen

ICLR 2026

/iclr/2026/zhu2026iclr-anchored/

Abstract

Post-training of large language models involves a fundamental trade-off between supervised fine-tuning (SFT), which efficiently mimics demonstrations but tends to memorize, and reinforcement learning (RL), which achieves better generaliza- tion at higher computational cost. Dynamic Fine-Tuning (DFT) recently emerged as a promising middle ground, reweighting SFT objectives with token probabili- ties and achieving improvements in certain reasoning domains, though it exhibits instability in other tasks. We provide a analysis of DFT through the reward- weighted regression (RWR) framework, revealing that it corresponds to a spe- cific auxiliary distribution choice that yields provably tighter RL bounds than standard SFT. However, our analysis also uncovers a critical limitation: this con- struction lacks distributional anchoring, leading to progressive drift that under- mines training stability. To address this, we propose Anchored Supervised Fine- Tuning (ASFT), which augments DFT’s reweighting with lightweight KL regu- larization to preserve tightness while ensuring stability. Empirically, ASFT con- sistently outperforms both SFT and DFT across mathematical reasoning, medical knowledge grounding, and code generation, achieving substantial improvements with minimal computational overhead. Our RWR framework provides a system- atic lens for understanding post-training methods and demonstrates that principled theoretical analysis leads to both stronger guarantees and practical gains.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhu et al. "Anchored Supervised Fine-Tuning." International Conference on Learning Representations, 2026.

Markdown

[Zhu et al. "Anchored Supervised Fine-Tuning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhu2026iclr-anchored/)

BibTeX

@inproceedings{zhu2026iclr-anchored,
  title     = {{Anchored Supervised Fine-Tuning}},
  author    = {Zhu, He and Su, Junyou and Lai, Peng and Ma, Ren and Zhang, Wenjia and Yang, Linyi and Chen, Guanhua},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhu2026iclr-anchored/}
}