Sharkey, Lee

8 publications

ICLR 2025 Bilinear MLPs Enable Weight-Based Mechanistic Interpretability Michael T Pearce, Thomas Dooms, Alice Rigg, Jose Oramas, Lee Sharkey
TMLR 2025 Open Problems in Mechanistic Interpretability Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeffrey Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Isaac Bloom, Stella Biderman, AdriĆ  Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Mary Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, William Saunders, Eric J Michaud, Stephen Casper, Max Tegmark, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Thomas McGrath
ICLR 2025 Sparse Autoencoders Do Not Find Canonical Units of Analysis Patrick Leask, Bart Bussmann, Michael T Pearce, Joseph Isaac Bloom, Curt Tigges, Noura Al Moubayed, Lee Sharkey, Neel Nanda
NeurIPS 2024 Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill, Lee Sharkey
ICMLW 2024 Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill, Lee Sharkey
NeurIPSW 2024 Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations Kola Ayonrinde, Michael T Pearce, Lee Sharkey
NeurIPSW 2024 Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations Kola Ayonrinde, Michael T Pearce, Lee Sharkey
ICLR 2024 Sparse Autoencoders Find Highly Interpretable Features in Language Models Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, Lee Sharkey