Sztyber-Betley, Anna

4 publications

ICML 2025 Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs Jan Betley, Daniel Chee Hian Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martı́n Soto, Nathan Labenz, Owain Evans
ICLRW 2025 Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs Jan Betley, Daniel Chee Hian Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans
ICLR 2025 Tell Me About Yourself: LLMs Are Aware of Their Learned Behaviors Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans
NeurIPSW 2024 Language Models Can Articulate Their Implicit Goals Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans