Wang, Kevin Ro

2 publications

ICLR 2023 Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt
NeurIPSW 2022 Interpretability in the Wild: A Circuit for Indirect Object Identification in GPT-2 Small Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, Jacob Steinhardt