ML Anthology
Authors
Search
About
Carlon, Francesca
1 publications
TMLR
2026
Compromising Honesty and Harmlessness in Language Models via Covert Deception Attacks
Laurène Vaugrante
,
Francesca Carlon
,
Maluna Menke
,
Thilo Hagendorff