Soto, Martín

3 publications

ICLRW 2025 Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs Jan Betley, Daniel Chee Hian Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans
ICLR 2025 Tell Me About Yourself: LLMs Are Aware of Their Learned Behaviors Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans
NeurIPSW 2024 Language Models Can Articulate Their Implicit Goals Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans