Valentine, Dan

7 publications

ICLR 2025 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristobal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
ICML 2024 Debating with More Persuasive LLMs Leads to More Truthful Answers Akbir Khan, John Hughes, Dan Valentine, Laura Ruis, Kshitij Sachan, Ansh Radhakrishnan, Edward Grefenstette, Samuel R. Bowman, Tim Rocktäschel, Ethan Perez
NeurIPSW 2024 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Zane Durante, Cristobal Eyzaguirre, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
NeurIPSW 2024 Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Zane Durante, Cristobal Eyzaguirre, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
NeurIPSW 2024 When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristobal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
NeurIPSW 2024 When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristobal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, Tony Tong Wang, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez
NeurIPSW 2023 Linearly Structured World Representations in Maze-Solving Transformers Michael Ivanitskiy, Alexander F Spies, Tilman Räuker, Guillaume Corlouer, Christopher Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung