Stroebl, Benedikt

7 publications

ICLR 2026 Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation Sayash Kapoor, Benedikt Stroebl, Peter Kirgis, Nitya Nadgir, Zachary S Siegel, Boyi Wei, Tianci Xue, Ziru Chen, Felix Chen, Saiteja Utpala, Franck Ndzomga, Dheeraj Oruganty, Sophie Luskin, Kangheng Liu, Botao Yu, Amit Arora, Dongyoon Hahm, Harsh Trivedi, Huan Sun, Juyong Lee, Tengjun Jin, Yifan Mai, Yifei Zhou, Yuxuan Zhu, Rishi Bommasani, Daniel Kang, Dawn Song, Peter Henderson, Yu Su, Percy Liang, Arvind Narayanan

ICLR 2026 The Limits of Inference Scaling Through Resampling Benedikt Stroebl, Sayash Kapoor, Arvind Narayanan

TMLR 2025 AI Agents That Matter Sayash Kapoor, Benedikt Stroebl, Zachary S Siegel, Nitya Nadgir, Arvind Narayanan

NeurIPS 2025 Dynamic Risk Assessments for Offensive Cybersecurity Agents Boyi Wei, Benedikt Stroebl, Jiacen Xu, Joie Zhang, Zhou Li, Peter Henderson

UAI 2025 Hindsight Merging: Diverse Data Generation with Language Models Veniamin Veselovsky, Benedikt Stroebl, Gianluca Bencomo, Dilip Arumugam, Lisa Schut, Arvind Narayanan, Thomas L. Griffiths

NeurIPS 2025 Information Retrieval Induced Safety Degradation in AI Agents Cheng Yu, Benedikt Stroebl, Diyi Yang, Orestis Papakyriakopoulos

TMLR 2024 CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark Zachary S Siegel, Sayash Kapoor, Nitya Nadgir, Benedikt Stroebl, Arvind Narayanan