CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

Wang, Weida; Huang, Dongchen; Li, Jiatong; Yang, Tengchao; Zheng, Ziyang; Peng, Chuyi; Zhang, Di; Han, Dong; Chen, Benteng; Luo, Binzhao; Liu, Zhiyu; Liu, Kunling; Gao, Zhiyuan; Shiqigeng,; Ma, Wei; Su, Jiaming; Li, Xin; Pu, Shuchen; Shui, Yuhan; Cheng, Qianjia; Dou, Zhihao; Cui, Dongfei; He, Changyong; Zeng, Jin; Xie, Zeke; Su, Mao; Zhou, Dongzhan; Li, Yuqiang; Ouyang, Wanli; Cai, Yunqi; Dai, Xi; Zhang, Shufei; Bai, Lei; Cheng, Jinguang; Fang, Zhong; Weng, Hongming

CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics

ICLR 2026

/iclr/2026/wang2026iclr-cmphysbench/

Abstract

We introduce CMPhysBench, designed to assess the proficiency of Large Language Models (LLMs) in Condensed Matter Physics, as a novel Benchmark. CMPhysBench is composed of more than 520 graduate-level meticulously curated questions covering both representative subfields and foundational theoretical frameworks of condensed matter physics, such as magnetism, superconductivity, strongly correlated systems, etc. To ensure a deep understanding of the problem-solving process, we focus exclusively on calculation problems, requiring LLMs to independently generate comprehensive solutions. Meanwhile, leveraging tree-based representations of expressions, we introduce the Scalable Expression Edit Distance (SEED) score, which provides fine-grained (non-binary) partial credit and yields a more accurate assessment of similarity between prediction and ground-truth. Our results show that even the best models, Grok-4, reach only 36 average SEED score and 29% accuracy on CMPhysBench, underscoring a significant capability gap, especially for this practical and frontier domain relative to traditional physics.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wang et al. "CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-cmphysbench/)

BibTeX

@inproceedings{wang2026iclr-cmphysbench,
  title     = {{CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics}},
  author    = {Wang, Weida and Huang, Dongchen and Li, Jiatong and Yang, Tengchao and Zheng, Ziyang and Peng, Chuyi and Zhang, Di and Han, Dong and Chen, Benteng and Luo, Binzhao and Liu, Zhiyu and Liu, Kunling and Gao, Zhiyuan and Shiqigeng,  and Ma, Wei and Su, Jiaming and Li, Xin and Pu, Shuchen and Shui, Yuhan and Cheng, Qianjia and Dou, Zhihao and Cui, Dongfei and He, Changyong and Zeng, Jin and Xie, Zeke and Su, Mao and Zhou, Dongzhan and Li, Yuqiang and Ouyang, Wanli and Cai, Yunqi and Dai, Xi and Zhang, Shufei and Bai, Lei and Cheng, Jinguang and Fang, Zhong and Weng, Hongming},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-cmphysbench/}
}