ML Anthology
Authors
Search
About
Warncke, Niels
4 publications
ICLR
2026
Inoculation Prompting: Eliciting Traits from LLMs During Training Can Reduce Trait Expression at Test-Time
Daniel Tan
,
Anders Cairns Woodruff
,
Niels Warncke
,
Arun Jose
,
Maxime Nicolas Riché
,
David Demitri Africa
,
Mia Taylor
ICLR
2026
Strategic Obfuscation of Deceptive Reasoning in Language Models
Arun Jose
,
Niels Warncke
,
Mia Taylor
ICML
2025
Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs
Jan Betley
,
Daniel Chee Hian Tan
,
Niels Warncke
,
Anna Sztyber-Betley
,
Xuchan Bao
,
Martı́n Soto
,
Nathan Labenz
,
Owain Evans
ICLRW
2025
Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs
Jan Betley
,
Daniel Chee Hian Tan
,
Niels Warncke
,
Anna Sztyber-Betley
,
Xuchan Bao
,
Martín Soto
,
Nathan Labenz
,
Owain Evans