AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-Bench
Abstract
AI research agents are demonstrating great potential to accelerate scientific progress by automating the design, implementation, and training of machine learning models. We focus on methods for improving agents' performance on MLE-bench, a challenging benchmark where agents compete in Kaggle competitions to solve real-world machine learning problems. We formalize AI research agents as search policies that navigate a space of candidate solutions, iteratively modifying them using operators. By designing and systematically varying different operator sets and search policies (Greedy, MCTS, Evolutionary), we show that their interplay is critical for achieving high performance. Our best pairing of search strategy and operator set achieves a state-of-the-art result on MLE-bench lite, increasing the success rate of achieving a Kaggle medal from 39.6% to 47.7%. Our investigation underscores the importance of jointly considering the search strategy, operator design, and evaluation methodology in advancing automated machine learning.
Cite
Text
Toledo et al. "AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-Bench." Advances in Neural Information Processing Systems, 2025.Markdown
[Toledo et al. "AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-Bench." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/toledo2025neurips-ai/)BibTeX
@inproceedings{toledo2025neurips-ai,
title = {{AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-Bench}},
author = {Toledo, Edan and Hambardzumyan, Karen and Josifoski, Martin and Hazra, Rishi and Baldwin, Nicolas and Audran-Reiss, Alexis and Kuchnik, Michael and Magka, Despoina and Jiang, Minqi and Lupidi, Alisia Maria and Lupu, Andrei and Raileanu, Roberta and Shavrina, Tatiana and Niu, Kelvin and Gagnon-Audet, Jean-Christophe and Shvartsman, Michael and Sodhani, Shagun and Miller, Alexander H and Charnalia, Abhishek and Dunfield, Derek and Wu, Carole-Jean and Stenetorp, Pontus and Cancedda, Nicola and Foerster, Jakob Nicolaus and Bachrach, Yoram},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/toledo2025neurips-ai/}
}