Lau, Yeu-Tong

3 publications

ICML 2025 SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Isaac Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum Stuart Mcdougall, Kola Ayonrinde, Demian Till, Matthew Wearden, Arthur Conmy, Samuel Marks, Neel Nanda
ICMLW 2024 An Adversarial Example for Direct Logit Attribution: Memory Management in GELU-4L Jett Janiak, Can Rager, James Dao, Yeu-Tong Lau
NeurIPSW 2024 Applying Sparse Autoencoders to Unlearn Knowledge in Language Models Eoin Farrell, Yeu-Tong Lau, Arthur Conmy