KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

Shi, Jiajun; Yang, Jian; Liu, Jiaheng; Bu, Xingyuan; Chen, Jiangjie; Zhou, Junting; Ma, Kaijing; Wen, Zhoufutu; Wang, Bingli; He, Yancheng; Song, Liang; Zhu, Hualei; Li, Shilong; Wang, Xingjian; Zhang, Wei; Yuan, Ruibin; Yao, Yifan; Yang, Wenjun; Wang, Yunli; Fang, Siyuan; Yuan, Siyu; He, Qianyu; Tang, Xiangru; Tan, Yingshui; Zhou, Wangchunshu; Zhang, Zhaoxiang; Li, Zhoujun; Huang, Wenhao; Zhang, Ge

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

NeurIPS 2025

/neurips/2025/shi2025neurips-korgym/

Abstract

Recent advancements in large language models (LLMs) underscore the need for more comprehensive evaluation methods to accurately assess their reasoning capabilities. Existing benchmarks are often domain-specific and thus cannot fully capture an LLM’s general reasoning potential. To address this limitation, we introduce the **Knowledge Orthogonal Reasoning Gymnasium (KORGym)**, a dynamic evaluation platform inspired by KOR-Bench and Gymnasium. KORGym offers over fifty games in either textual or visual formats and supports interactive, multi-turn assessments with reinforcement learning scenarios. Using KORGym, we conduct extensive experiments on 19 LLMs and 8 VLMs, revealing consistent reasoning patterns within model families and demonstrating the superior performance of closed-source models. Further analysis examines the effects of modality, reasoning strategies, reinforcement learning techniques, and response length on model performance. We expect KORGym to become a valuable resource for advancing LLM reasoning research and developing evaluation methodologies suited to complex, interactive environments.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Shi et al. "KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Shi et al. "KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shi2025neurips-korgym/)

BibTeX

@inproceedings{shi2025neurips-korgym,
  title     = {{KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation}},
  author    = {Shi, Jiajun and Yang, Jian and Liu, Jiaheng and Bu, Xingyuan and Chen, Jiangjie and Zhou, Junting and Ma, Kaijing and Wen, Zhoufutu and Wang, Bingli and He, Yancheng and Song, Liang and Zhu, Hualei and Li, Shilong and Wang, Xingjian and Zhang, Wei and Yuan, Ruibin and Yao, Yifan and Yang, Wenjun and Wang, Yunli and Fang, Siyuan and Yuan, Siyu and He, Qianyu and Tang, Xiangru and Tan, Yingshui and Zhou, Wangchunshu and Zhang, Zhaoxiang and Li, Zhoujun and Huang, Wenhao and Zhang, Ge},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/shi2025neurips-korgym/}
}