Hanna, Michael

5 publications

ICML 2025 MIB: A Mechanistic Interpretability Benchmark Aaron Mueller, Atticus Geiger, Sarah Wiegreffe, Dana Arad, Iván Arcuschin, Adam Belfki, Yik Siu Chan, Jaden Fried Fiotto-Kaufman, Tal Haklay, Michael Hanna, Jing Huang, Rohan Gupta, Yaniv Nikankin, Hadas Orgad, Nikhil Prakash, Anja Reusch, Aruna Sankaranarayanan, Shun Shao, Alessandro Stolfo, Martin Tutek, Amir Zur, David Bau, Yonatan Belinkov
ICMLW 2024 Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms Michael Hanna, Sandro Pezzelle, Yonatan Belinkov
NeurIPS 2024 LLM Circuit Analyses Are Consistent Across Training and Scale Curt Tigges, Michael Hanna, Qinan Yu, Stella Biderman
ICMLW 2024 LLM Circuit Analyses Are Consistent Across Training and Scale Curt Tigges, Michael Hanna, Qinan Yu, Stella Biderman
NeurIPS 2023 How Does GPT-2 Compute Greater-than?: Interpreting Mathematical Abilities in a Pre-Trained Language Model Michael Hanna, Ollie Liu, Alexandre Variengien