ML Anthology
Authors
Search
About
Murphy, Brendan
3 publications
ICLR
2025
On Targeted Manipulation and Deception When Optimizing LLMs for User Feedback
Marcus Williams
,
Micah Carroll
,
Adhyyan Narang
,
Constantin Weisser
,
Brendan Murphy
,
Anca Dragan
AAAI
2025
Scaling Trends for Data Poisoning in LLMs
Dillon Bowen
,
Brendan Murphy
,
Will Cai
,
David Khachaturov
,
Adam Gleave
,
Kellin Pelrine
NeurIPSW
2024
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback
Marcus Williams
,
Micah Carroll
,
Constantin Weisser
,
Brendan Murphy
,
Adhyyan Narang
,
Anca Dragan