Greenblatt, Ryan

2 publications

ICML 2024 AI Control: Improving Safety Despite Intentional Subversion Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, Fabien Roger
NeurIPS 2024 Stress-Testing Capability Elicitation with Password-Locked Models Ryan Greenblatt, Fabien Roger, Dmitrii Krasheninnikov, David Krueger