KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks
Abstract
In this paper, we introduce Knowledge-Orthogonal Reasoning (KOR), a concept aimed at minimizing reliance on domain-specific knowledge, enabling more accurate evaluation of models' reasoning abilities in out-of-distribution settings. Based on this concept, we propose the Knowledge-Orthogonal Reasoning Benchmark (KOR-Bench), encompassing five task categories: Operation, Logic, Cipher, Puzzle, and Counterfactual. KOR-Bench emphasizes models' effectiveness in applying new rule descriptions to solve novel rule-driven questions. O1-Preview and O1-Mini achieve accuracies of 72.88\% and 70.16\%, surpassing Claude-3.5-Sonnet and GPT-4o (58.96\% and 58.00\%), highlighting the effectiveness of KOR-Bench. We perform detailed analyses, identifying bottlenecks in the Cipher task with Stepwise Prompting, where two rounds of Self-Correction yield optimal results. We evaluate performance across three integrated tasks, explore the impact of Tricks on the Puzzle task, and visualize rule-focused attention. Additionally, we conduct an ablation study on dataset size, benchmark correlations, and zero-shot and three-shot "only questions" experiments. KOR-Bench aims to enhance reasoning evaluation and support further research in this area.
Cite
Text
Ma et al. "KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks." International Conference on Learning Representations, 2025.Markdown
[Ma et al. "KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/ma2025iclr-korbench/)BibTeX
@inproceedings{ma2025iclr-korbench,
title = {{KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks}},
author = {Ma, Kaijing and Du, Xeron and Wang, Yunran and Zhang, Haoran and ZhoufutuWen, and Qu, Xingwei and Yang, Jian and Liu, Jiaheng and Liu, Minghao and Yue, Xiang and Huang, Wenhao and Zhang, Ge},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/ma2025iclr-korbench/}
}