Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models

Abstract

For an LLM to correctly respond to an instruction it must understand both the semantics and the domain (i.e., subject area) of a given task-instruction pair. However, syntax can also convey implicit information. Recent work shows that \textit{syntactic templates}---frequent sequences of Part-of-Speech (PoS) tags---are prevalent in training data and often appear in model outputs. In this work we characterize syntactic templates, domain, and semantics in task-instruction pairs. We identify cases of spurious correlations between syntax and domain, where models learn to associate a domain with syntax during training; this can sometimes override prompt semantics. Using a synthetic training dataset, we find that the syntactic-domain correlation can lower performance (mean 0.51 +/- 0.06) on entity knowledge tasks in OLMo-2 models (1B-13B). We introduce an evaluation framework to detect this phenomenon in trained models, and show that it occurs on a subset of the FlanV2 dataset in open (OLMo-2-7B; Llama-4-Maverick), and closed (GPT-4o) models. Finally, we present a case study on the implications for LLM security, showing that unintended syntactic-domain correlations can be used to bypass refusals in OLMo-2-7B Instruct and GPT-4o. Our findings highlight two needs: (1) to explicitly test for syntactic-domain correlations, and (2) to ensure \textit{syntactic} diversity in training data, specifically within domains, to prevent such spurious correlations.

Cite

Text

Shaib et al. "Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Shaib et al. "Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shaib2025neurips-learning/)

BibTeX

@inproceedings{shaib2025neurips-learning,
  title     = {{Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models}},
  author    = {Shaib, Chantal and Suriyakumar, Vinith Menon and Wallace, Byron C and Ghassemi, Marzyeh},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/shaib2025neurips-learning/}
}