PHYSICS: Benchmarking Foundation Models for Problem Solving in Physics
Abstract
We introduce PHYSICS, a comprehensive benchmark for PhD-qualifying exam physics problem solving. It contains 1,297 expert-annotated problems covering six core areas: classical mechanics, quantum mechanics, thermodynamics and statistical mechanics, electromagnetism, atomic physics, and optics. Each problem requires advanced physics knowledge and mathematical reasoning. We develop a robust automated evaluation system for precise and reliable validation. Our evaluation of leading foundation models reveals substantial limitations. Even the most advanced model, o3-mini, achieves only 59.9% accuracy, highlighting significant challenges in solving high-level scientific problems. Through comprehensive error analysis, exploration of diverse prompting strategies, and Retrieval-Augmented Generation (RAG)-based knowledge augmentation, we identify key areas for improvement, laying the foundation for future advancements
Cite
Text
Feng et al. "PHYSICS: Benchmarking Foundation Models for Problem Solving in Physics." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.Markdown
[Feng et al. "PHYSICS: Benchmarking Foundation Models for Problem Solving in Physics." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.](https://mlanthology.org/iclrw/2025/feng2025iclrw-physics/)BibTeX
@inproceedings{feng2025iclrw-physics,
title = {{PHYSICS: Benchmarking Foundation Models for Problem Solving in Physics}},
author = {Feng, Kaiyue and Zhao, Yilun and Liu, Yixin and Yang, Tianyu and Zhao, Chen and Sous, John and Cohan, Arman},
booktitle = {ICLR 2025 Workshops: LLM_Reason_and_Plan},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/feng2025iclrw-physics/}
}