ML Anthology
Authors
Search
About
Mallen, Alex Troy
3 publications
ICML
2025
Automatically Interpreting Millions of Features in Large Language Models
Gonçalo Santos Paulo
,
Alex Troy Mallen
,
Caden Juang
,
Nora Belrose
NeurIPS
2025
Why Do Some Language Models Fake Alignment While Others Don't?
Abhay Sheshadri
,
John Hughes
,
Julian Michael
,
Alex Troy Mallen
,
Arun Jose
,
Fabien Roger
ICML
2024
Neural Networks Learn Statistics of Increasing Complexity
Nora Belrose
,
Quintin Pope
,
Lucia Quirke
,
Alex Troy Mallen
,
Xiaoli Fern