Heidecke, Johannes

6 publications

ICLR 2025 First-Person Fairness in Chatbots Tyna Eloundou, Alex Beutel, David G. Robinson, Keren Gu, Anna-Luisa Brakman, Pamela Mishkin, Meghan Shah, Johannes Heidecke, Lilian Weng, Adam Tauman Kalai
ICML 2025 PaperBench: Evaluating AI’s Ability to Replicate AI Research Giulio Starace, Oliver Jaffe, Dane Sherburn, James Aung, Jun Shern Chan, Leon Maksin, Rachel Dias, Evan Mays, Benjamin Kinsella, Wyatt Thompson, Johannes Heidecke, Amelia Glaese, Tejal Patwardhan
ICML 2025 SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Samuel Miserendino, Michele Wang, Tejal Patwardhan, Johannes Heidecke
NeurIPSW 2024 Diverse and Effective Red Teaming with Auto-Generated Rewards and Multi-Step Reinforcement Learning Alex Beutel, Kai Yuanqing Xiao, Johannes Heidecke, Lilian Weng
ICMLW 2024 Rule Based Rewards for Fine-Grained LLM Safety Tong Mu, Alec Helyar, Johannes Heidecke, Joshua Achiam, Andrea Vallone, Ian D Kivlichan, Molly Lin, Alex Beutel, John Schulman, Lilian Weng
NeurIPS 2024 Rule Based Rewards for Language Model Safety Tong Mu, Alec Helyar, Johannes Heidecke, Joshua Achiam, Andrea Vallone, Ian Kivlichan, Molly Lin, Alex Beutel, John Schulman, Lilian Weng