Betley, Jan

6 publications

ICML 2025 Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs Jan Betley, Daniel Chee Hian Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martı́n Soto, Nathan Labenz, Owain Evans
ICLRW 2025 Emergent Misalignment: Narrow Finetuning Can Produce Broadly Misaligned LLMs Jan Betley, Daniel Chee Hian Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans
ICLR 2025 Tell Me About Yourself: LLMs Are Aware of Their Learned Behaviors Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans
NeurIPS 2024 Connecting the Dots: LLMs Can Infer and Verbalize Latent Structure from Disparate Training Data Johannes Treutlein, Dami Choi, Jan Betley, Sam Marks, Cem Anil, Roger Grosse, Owain Evans
NeurIPSW 2024 Language Models Can Articulate Their Implicit Goals Jan Betley, Xuchan Bao, Martín Soto, Anna Sztyber-Betley, James Chua, Owain Evans
NeurIPS 2024 Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans