Hughes, John

15 publications

NeurIPS 2025 Best-of-N Jailbreaking John Hughes, Sara Price, Aengus Lynch, Rylan Schaeffer, Fazl Barez, Arushi Somani, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, Mrinank Sharma
ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristobal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
ICML 2025 How Do Large Language Monkeys Get Their Power (Laws)? Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo
ICLR 2025 Looking Inward: Language Models Can Learn About Themselves by Introspection Felix Jedidja Binder, James Chua, Tomek Korbak, Henry Sleight, John Hughes, Robert Long, Ethan Perez, Miles Turpin, Owain Evans
NeurIPS 2025 Why Do Some Language Models Fake Alignment While Others Don't? Abhay Sheshadri, John Hughes, Julian Michael, Alex Troy Mallen, Arun Jose, Fabien Roger
ICML 2024 Debating with More Persuasive LLMs Leads to More Truthful Answers Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez
NeurIPSW 2024 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Zane Durante, Cristobal Eyzaguirre, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
NeurIPSW 2024 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Zane Durante, Cristobal Eyzaguirre, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
ICMLW 2024 Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Tomasz Korbak, Henry Sleight, Rajashree Agrawal, John Hughes, Dhruv Bhandarkar Pai, Andrey Gromov, Dan Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo
NeurIPSW 2024 Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers Tony Tong Wang, John Hughes, Henry Sleight, Rylan Schaeffer, Rajashree Agrawal, Fazl Barez, Mrinank Sharma, Jesse Mu, Nir N Shavit, Ethan Perez
NeurIPSW 2024 Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers Tony Tong Wang, John Hughes, Henry Sleight, Rylan Schaeffer, Rajashree Agrawal, Fazl Barez, Mrinank Sharma, Jesse Mu, Nir N Shavit, Ethan Perez
JMLR 2024 Measuring Sample Quality in Algorithms for Intractable Normalizing Function Problems Bokgyeong Kang, John Hughes, Murali Haran
NeurIPSW 2024 When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristobal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
NeurIPSW 2024 When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristobal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
NeurIPS 2020 Hierarchical Quantized Autoencoders Will Williams, Sam Ringer, Tom Ash, David MacLeod, Jamie Dougherty, John Hughes