Karvonen, Adam

4 publications

ICML 2025 Learning Multi-Level Features with Matryoshka Sparse Autoencoders Bart Bussmann, Noa Nabeshima, Adam Karvonen, Neel Nanda
ICML 2025 SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Isaac Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum Stuart Mcdougall, Kola Ayonrinde, Demian Till, Matthew Wearden, Arthur Conmy, Samuel Marks, Neel Nanda
NeurIPS 2024 Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Adam Karvonen, Benjamin Wright, Can Rager, Rico Angell, Jannik Brinkmann, Logan Smith, Claudio Mayrink Verdun, David Bau, Samuel Marks
ICMLW 2024 Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Adam Karvonen, Benjamin Wright, Can Rager, Rico Angell, Jannik Brinkmann, Logan Riggs Smith, Claudio Mayrink Verdun, David Bau, Samuel Marks