SciCode: A Research Coding Benchmark Curated by Scientists
Abstract
Since language models (LMs) now outperform average humans on many challenging tasks, it is becoming increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this by examining LM capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields, including mathematics, physics, chemistry, biology, and materials science, we create a scientist-curated coding benchmark, SciCode. The problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. OpenAI o1-preview, the best-performing model among those tested, can solve only 7.7\% of the problems in the most realistic setting. We believe that SciCode demonstrates both contemporary LMs' progress towards realizing helpful scientific assistants and sheds light on the building and evaluation of scientific AI in the future.
Cite
Text
Tian et al. "SciCode: A Research Coding Benchmark Curated by Scientists." Neural Information Processing Systems, 2024. doi:10.52202/079017-0963Markdown
[Tian et al. "SciCode: A Research Coding Benchmark Curated by Scientists." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/tian2024neurips-scicode/) doi:10.52202/079017-0963BibTeX
@inproceedings{tian2024neurips-scicode,
title = {{SciCode: A Research Coding Benchmark Curated by Scientists}},
author = {Tian, Minyang and Gao, Luyu and Zhang, Shizhuo Dylan and Chen, Xinan and Fan, Cunwei and Guo, Xuefei and Haas, Roland and Ji, Pan and Krongchon, Kittithat and Li, Yao and Liu, Shengyan and Luo, Di and Ma, Yutao and Tong, Hao and Trinh, Kha and Tian, Chenyu and Wang, Zihan and Wu, Bohao and Xiong, Yanyu and Yin, Shengzhu and Zhu, Minhui and Lieret, Kilian and Lu, Yanxin and Liu, Genglin and Du, Yufeng and Tao, Tianhua and Press, Ofir and Callan, Jamie and Huerta, Eliu and Peng, Hao},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-0963},
url = {https://mlanthology.org/neurips/2024/tian2024neurips-scicode/}
}