Stroebl, Benedikt

7 publications

ICLR 2026 Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation Sayash Kapoor, Benedikt Stroebl, Peter Kirgis, Nitya Nadgir, Zachary S Siegel, Boyi Wei, Tianci Xue, Ziru Chen, Felix Chen, Saiteja Utpala, Franck Ndzomga, Dheeraj Oruganty, Sophie Luskin, Kangheng Liu, Botao Yu, Amit Arora, Dongyoon Hahm, Harsh Trivedi, Huan Sun, Juyong Lee, Tengjun Jin, Yifan Mai, Yifei Zhou, Yuxuan Zhu, Rishi Bommasani, Daniel Kang, Dawn Song, Peter Henderson, Yu Su, Percy Liang, Arvind Narayanan
ICLR 2026 The Limits of Inference Scaling Through Resampling Benedikt Stroebl, Sayash Kapoor, Arvind Narayanan
TMLR 2025 AI Agents That Matter Sayash Kapoor, Benedikt Stroebl, Zachary S Siegel, Nitya Nadgir, Arvind Narayanan
NeurIPS 2025 Dynamic Risk Assessments for Offensive Cybersecurity Agents Boyi Wei, Benedikt Stroebl, Jiacen Xu, Joie Zhang, Zhou Li, Peter Henderson
UAI 2025 Hindsight Merging: Diverse Data Generation with Language Models Veniamin Veselovsky, Benedikt Stroebl, Gianluca Bencomo, Dilip Arumugam, Lisa Schut, Arvind Narayanan, Thomas L. Griffiths
NeurIPS 2025 Information Retrieval Induced Safety Degradation in AI Agents Cheng Yu, Benedikt Stroebl, Diyi Yang, Orestis Papakyriakopoulos
TMLR 2024 CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Zachary S Siegel, Sayash Kapoor, Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan