Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Abstract

Large Language Models (LLMs) achieve strong performance on diverse tasks but often exhibit cognitive inertia, struggling to follow instructions that conflict with the standardized patterns learned during supervised fine-tuning (SFT). To evaluate this limitation, we propose Inverse IFEval, a benchmark that measures models’ Counter-intuitive Ability—their capacity to override training-induced biases and comply with adversarial instructions. Inverse IFEval introduces eight types of such challenges, including Question Correction, Intentional Textual Flaws, Code without Comments, and Counterfactual Answering. Using a human-in-the-loop pipeline, we construct a dataset of 1012 high-quality Chinese and English questions across 23 domains, evaluated under an optimized LLM-as-a-Judge framework. Experiments on existing leading LLMs demonstrate the necessity of our proposed Inverse IFEval benchmark. Our findings emphasize that future alignment efforts should not only pursue fluency and factual correctness but also account for adaptability under unconventional contexts. We hope that Inverse IFEval serves as both a diagnostic tool and a foundation for developing methods that mitigate cognitive inertia, reduce overfitting to narrow patterns, and ultimately enhance the instruction-following reliability of LLMs in diverse and unpredictable real-world scenarios.

Cite

Text

Zhang et al. "Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-inverse/)

BibTeX

@inproceedings{zhang2026iclr-inverse,
  title     = {{Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?}},
  author    = {Zhang, Qinyan and Lei, Xinping and Miao, Ruijie and Yu, Fu and Fan, Haojie and Chang, Le and Hou, Jiafan and Zhang, Dingling and Hou, Zhongfei and ZiqiangYang,  and Puchangxin,  and Hu, Fei and Liu, Jingkai and Liu, Jiaheng and Yang, Tong and Wang, Zaiyuan and Zhang, Ge and Chen, Xinjie and Jiao, Jianpeng and Huang, Wenhao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-inverse/}
}