ML Anthology
Authors
Search
About
Hagendorff, Thilo
2 publications
TMLR
2026
Compromising Honesty and Harmlessness in Language Models via Covert Deception Attacks
Laurène Vaugrante
,
Francesca Carlon
,
Maluna Menke
,
Thilo Hagendorff
TMLR
2025
Prompt Engineering Techniques for Language Model Reasoning Lack Replicability
Laurène Vaugrante
,
Mathias Niepert
,
Thilo Hagendorff