Assessing Robustness to Spurious Correlations in Post-Training Language Models
Abstract
Supervised and preference-based fine-tuning techniques have become popular for aligning large language models (LLMs) with user intent and correctness criteria. However, real-world training data often exhibits spurious correlations—arising from biases, dataset artifacts, or other “shortcut” features—that can compromise a model’s performance or generalization. In this paper, we systematically evaluate three post-training algorithms—Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and KTO (Kahneman-Tversky Optimization)—across a di- verse set of synthetic tasks and spuriousness conditions. Our tasks span mathemat- ical reasoning, constrained instruction-following, and document-grounded ques- tion answering. We vary the degree of spurious correlation (10% vs. 90%) and investigate two forms of artifacts: “Feature Ambiguity” and “Distributional Nar- rowness.” Our results show that the models often but not always degrade under higher spuriousness. The preference-based methods (DPO/KTO) can demonstrate relative robustness in mathematical reasoning tasks. By contrast, SFT maintains stronger performance in complex, context-intensive tasks. These findings high- light that no single post-training strategy universally outperforms in all scenarios; the best choice depends on the type of target task and the nature of spurious cor- relations.
Cite
Text
Shuieh et al. "Assessing Robustness to Spurious Correlations in Post-Training Language Models." ICLR 2025 Workshops: SCSL, 2025.Markdown
[Shuieh et al. "Assessing Robustness to Spurious Correlations in Post-Training Language Models." ICLR 2025 Workshops: SCSL, 2025.](https://mlanthology.org/iclrw/2025/shuieh2025iclrw-assessing/)BibTeX
@inproceedings{shuieh2025iclrw-assessing,
title = {{Assessing Robustness to Spurious Correlations in Post-Training Language Models}},
author = {Shuieh, Julia and Singhal, Prasann and Shanker, Apaar and Heyer, John and Pu, George and Denton, Samuel Marc},
booktitle = {ICLR 2025 Workshops: SCSL},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/shuieh2025iclrw-assessing/}
}