Fluri, Lukas

2 publications

ICML 2025 The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret Lukas Fluri, Leon Lang, Alessandro Abate, Patrick Forré, David Krueger, Joar Max Viktor Skalse
NeurIPSW 2023 Evaluating Superhuman Models with Consistency Checks Lukas Fluri, Daniel Paleka, Florian Tramèr