ML Anthology
Authors
Search
About
Smith, Logan Riggs
3 publications
TMLR
2025
Decomposing the Dark Matter of Sparse Autoencoders
Joshua Engels
,
Logan Riggs Smith
,
Max Tegmark
ICMLW
2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Adam Karvonen
,
Benjamin Wright
,
Can Rager
,
Rico Angell
,
Jannik Brinkmann
,
Logan Riggs Smith
,
Claudio Mayrink Verdun
,
David Bau
,
Samuel Marks
ICLR
2024
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Robert Huben
,
Hoagy Cunningham
,
Logan Riggs Smith
,
Aidan Ewart
,
Lee Sharkey