Murphy, Brendan

3 publications

ICLR 2025 On Targeted Manipulation and Deception When Optimizing LLMs for User Feedback Marcus Williams, Micah Carroll, Adhyyan Narang, Constantin Weisser, Brendan Murphy, Anca Dragan
AAAI 2025 Scaling Trends for Data Poisoning in LLMs Dillon Bowen, Brendan Murphy, Will Cai, David Khachaturov, Adam Gleave, Kellin Pelrine
NeurIPSW 2024 Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback Marcus Williams, Micah Carroll, Constantin Weisser, Brendan Murphy, Adhyyan Narang, Anca Dragan