Dao, James

2 publications

ICMLW 2024 An Adversarial Example for Direct Logit Attribution: Memory Management in GELU-4L Jett Janiak, Can Rager, James Dao, Yeu-Tong Lau
ICMLW 2024 Challenges in Mechanistically Interpreting Model Representations Satvik Golechha, James Dao