Sachan, Kshitij

2 publications

ICML 2024 AI Control: Improving Safety Despite Intentional Subversion Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan, Fabien Roger
ICML 2024 Debating with More Persuasive LLMs Leads to More Truthful Answers Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez