Gravity-Bench-V1: A Benchmark on Gravitational Physics Discovery for Agents
Abstract
Modern science emerged from reasoning over repeatedly-observed planetary motions. We present Gravity-Bench-v1, an environment-based benchmark that challenges AI agents on tasks that parallel this historical development. Gravity-Bench-v1 evaluates agents on the discovery of physics concealed within a dynamic environment, using rigorous gravitational dynamics simulations. Gravity-Bench includes out-of-distribution cases, i.e. with physics that deviates from the real world, to evaluate true scientific generalization capabilities. Agents must plan to collect data within an experimental budget and must perform a dynamic form of data analysis and reasoning to solve tasks efficiently. Our benchmark admits an open-ended space of solutions. Reference solutions for each task are provided to calibrate AI performance against human expertise. Technically at an upper-undergraduate level, our benchmark proves challenging to baseline AI agents. Gravity-Bench-v1 and planned extensions should help map out AI progress towards scientific discovery capabilities.
Cite
Text
Koblischke et al. "Gravity-Bench-V1: A Benchmark on Gravitational Physics Discovery for Agents." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Koblischke et al. "Gravity-Bench-V1: A Benchmark on Gravitational Physics Discovery for Agents." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/koblischke2025icml-gravitybenchv1/)BibTeX
@inproceedings{koblischke2025icml-gravitybenchv1,
title = {{Gravity-Bench-V1: A Benchmark on Gravitational Physics Discovery for Agents}},
author = {Koblischke, Nolan and Jang, Hyunseok and Menou, Kristen and Ali-Dib, Mohamad},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {31084-31104},
volume = {267},
url = {https://mlanthology.org/icml/2025/koblischke2025icml-gravitybenchv1/}
}