DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

Jansen, Peter; Côté, Marc-Alexandre; Khot, Tushar; Bransom, Erin; Mishra, Bhavana Dalvi; Majumder, Bodhisattwa Prasad; Tafjord, Oyvind; Clark, Peter

doi:10.52202/079017-0324

DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, Peter Clark

NeurIPS 2024

doi:10.52202/079017-0324 /neurips/2024/jansen2024neurips-discoveryworld/

Abstract

Automated scientific discovery promises to accelerate progress across scientific domains, but evaluating an agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DiscoveryWorld, a virtual environment that enables benchmarking an agent's ability to perform complete cycles of novel scientific discovery in an inexpensive, simulated, multi-modal, long-horizon, and fictional setting. DiscoveryWorld consists of 24 scientific tasks across three levels of difficulty, each with parametric variations that provide new discoveries for agents to make across runs. Tasks require an agent to form hypotheses, design and run experiments, analyze results, and act on conclusions. Task difficulties are normed to range from straightforward to challenging for human scientists with advanced degrees. DiscoveryWorld further provides three automatic metrics for evaluating performance, including: (1) binary task completion, (2) fine-grained report cards detailing procedural scoring of task-relevant actions, and (3) the accuracy of discovered explanatory knowledge. While simulated environments such as DiscoveryWorld are low-fidelity compared to the real world, we find that strong baseline agents struggle on most DiscoveryWorld tasks, highlighting the utility of using simulated environments as proxy tasks for near-term development of scientific discovery competency in agents.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Jansen et al. "DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents." Neural Information Processing Systems, 2024. doi:10.52202/079017-0324

Markdown

[Jansen et al. "DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/jansen2024neurips-discoveryworld/) doi:10.52202/079017-0324

BibTeX

@inproceedings{jansen2024neurips-discoveryworld,
  title     = {{DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents}},
  author    = {Jansen, Peter and Côté, Marc-Alexandre and Khot, Tushar and Bransom, Erin and Mishra, Bhavana Dalvi and Majumder, Bodhisattwa Prasad and Tafjord, Oyvind and Clark, Peter},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0324},
  url       = {https://mlanthology.org/neurips/2024/jansen2024neurips-discoveryworld/}
}