Ayonrinde, Kola

5 publications

ICLRW 2025 Position: Interpretability Is a Bidirectional Communication Problem Kola Ayonrinde

ICML 2025 SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Isaac Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum Stuart Mcdougall, Kola Ayonrinde, Demian Till, Matthew Wearden, Arthur Conmy, Samuel Marks, Neel Nanda

NeurIPSW 2024 Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations Kola Ayonrinde, Michael T Pearce, Lee Sharkey

NeurIPSW 2024 Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations Kola Ayonrinde, Michael T Pearce

NeurIPSW 2024 Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations Kola Ayonrinde, Michael T Pearce, Lee Sharkey