ML Anthology
Authors
Search
About
Smith, Logan
2 publications
NeurIPS
2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Adam Karvonen
,
Benjamin Wright
,
Can Rager
,
Rico Angell
,
Jannik Brinkmann
,
Logan Smith
,
Claudio Mayrink Verdun
,
David Bau
,
Samuel Marks
NeurIPS
2021
Optimal Policies Tend to Seek Power
Alex Turner
,
Logan Smith
,
Rohin Shah
,
Andrew Critch
,
Prasad Tadepalli