ML Anthology
Authors
Search
About
Wang, Kevin Ro
2 publications
ICLR
2023
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
Kevin Ro Wang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck Shlegeris
,
Jacob Steinhardt
NeurIPSW
2022
Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small
Kevin Ro Wang
,
Alexandre Variengien
,
Arthur Conmy
,
Buck Shlegeris
,
Jacob Steinhardt