ML Anthology
Authors
Search
About
Weisser, Constantin
2 publications
ICLR
2025
On Targeted Manipulation and Deception When Optimizing LLMs for User Feedback
Marcus Williams
,
Micah Carroll
,
Adhyyan Narang
,
Constantin Weisser
,
Brendan Murphy
,
Anca Dragan
NeurIPSW
2024
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback
Marcus Williams
,
Micah Carroll
,
Constantin Weisser
,
Brendan Murphy
,
Adhyyan Narang
,
Anca Dragan