Mindermann, Sören

7 publications

ICLR 2024 How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions Lorenzo Pacchiardi, Alex James Chan, Sören Mindermann, Ilan Moscovitz, Alexa Yue Pan, Yarin Gal, Owain Evans, Jan M. Brauner
ICLR 2024 The Alignment Problem from a Deep Learning Perspective Richard Ngo, Lawrence Chan, Sören Mindermann
ICML 2022 Prioritized Training on Points That Are Learnable, Worth Learning, and Not yet Learnt Sören Mindermann, Jan M Brauner, Muhammed T Razzak, Mrinank Sharma, Andreas Kirsch, Winnie Xu, Benedikt Höltgen, Aidan N Gomez, Adrien Morisot, Sebastian Farquhar, Yarin Gal
ICML 2021 Quantifying Ignorance in Individual-Level Causal-Effect Estimates Under Hidden Confounding Andrew Jesson, Sören Mindermann, Yarin Gal, Uri Shalit
NeurIPS 2020 How Robust Are the Estimated Effects of Nonpharmaceutical Interventions Against COVID-19? Mrinank Sharma, Sören Mindermann, Jan Brauner, Gavin Leech, Anna Stephenson, Tomáš Gavenčiak, Jan Kulveit, Yee Whye Teh, Leonid Chindelevitch, Yarin Gal
NeurIPS 2020 Identifying Causal-Effect Inference Failure with Uncertainty-Aware Models Andrew Jesson, Sören Mindermann, Uri Shalit, Yarin Gal
NeurIPS 2018 Occam's Razor Is Insufficient to Infer the Preferences of Irrational Agents Stuart Armstrong, Sören Mindermann