ML Anthology
Authors
Search
About
Sztyber-Betley, Anna
4 publications
ICML
2025
Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs
Jan Betley
,
Daniel Chee Hian Tan
,
Niels Warncke
,
Anna Sztyber-Betley
,
Xuchan Bao
,
Martı́n Soto
,
Nathan Labenz
,
Owain Evans
ICLRW
2025
Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs
Jan Betley
,
Daniel Chee Hian Tan
,
Niels Warncke
,
Anna Sztyber-Betley
,
Xuchan Bao
,
Martín Soto
,
Nathan Labenz
,
Owain Evans
ICLR
2025
Tell Me About Yourself: LLMs Are Aware of Their Learned Behaviors
Jan Betley
,
Xuchan Bao
,
Martín Soto
,
Anna Sztyber-Betley
,
James Chua
,
Owain Evans
NeurIPSW
2024
Language Models Can Articulate Their Implicit Goals
Jan Betley
,
Xuchan Bao
,
Martín Soto
,
Anna Sztyber-Betley
,
James Chua
,
Owain Evans