ML Anthology
Authors
Search
About
Greenblatt, Ryan
2 publications
ICML
2024
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt
,
Buck Shlegeris
,
Kshitij Sachan
,
Fabien Roger
NeurIPS
2024
Stress-Testing Capability Elicitation with Password-Locked Models
Ryan Greenblatt
,
Fabien Roger
,
Dmitrii Krasheninnikov
,
David Krueger