ML Anthology
Authors
Search
About
Wright, Benjamin
2 publications
NeurIPS
2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Adam Karvonen
,
Benjamin Wright
,
Can Rager
,
Rico Angell
,
Jannik Brinkmann
,
Logan Smith
,
Claudio Mayrink Verdun
,
David Bau
,
Samuel Marks
ICMLW
2024
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Adam Karvonen
,
Benjamin Wright
,
Can Rager
,
Rico Angell
,
Jannik Brinkmann
,
Logan Riggs Smith
,
Claudio Mayrink Verdun
,
David Bau
,
Samuel Marks