ML Anthology
Authors
Search
About
Roger, Fabien
6 publications
NeurIPS
2025
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Cameron Tice
,
Philipp Alexander Kreer
,
Nathan Helm-Burger
,
Prithviraj Singh Shahani
,
Fedor Ryzhenkov
,
Fabien Roger
,
Clement Neo
,
Jacob Haimes
,
Felix Hofstätter
,
Teun van der Weij
NeurIPS
2025
Quantifying Elicitation of Latent Capabilities in Language Models
Elizabeth Donoway
,
Hailey Joren
,
Arushi Somani
,
Henry Sleight
,
Julian Michael
,
Michael R DeWeese
,
John Schulman
,
Ethan Perez
,
Fabien Roger
,
Jan Leike
NeurIPS
2025
Why Do Some Language Models Fake Alignment While Others Don't?
Abhay Sheshadri
,
John Hughes
,
Julian Michael
,
Alex Troy Mallen
,
Arun Jose
,
Fabien Roger
ICML
2024
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt
,
Buck Shlegeris
,
Kshitij Sachan
,
Fabien Roger
TMLR
2024
Language Models Are Better than Humans at Next-Token Prediction
Buck Shlegeris
,
Fabien Roger
,
Lawrence Chan
,
Euan McLean
NeurIPS
2024
Stress-Testing Capability Elicitation with Password-Locked Models
Ryan Greenblatt
,
Fabien Roger
,
Dmitrii Krasheninnikov
,
David Krueger