Investigating the Indirect Object Identification Circuit in Mamba

Ensign, Danielle; Garriga-Alonso, Adrià

Investigating the Indirect Object Identification Circuit in Mamba

ICMLW 2024

/icmlw/2024/ensign2024icmlw-investigating/

Abstract

How much will interpretability techniques developed now generalize to future models? A good case study is Mamba, a recent recurrent architecture with scaling comparable to Transformers. We adapt pre-Mamba techniques to Mamba, and partially reverse engineer the circuit responsible for the Indirect Object Identification (IOI) task. The techniques provide evidence that 1) Layer 39 is a key bottleneck, 2) Convs of Layer 39 shift names one position forward, and 3) The name entities are stored linearly in Layer 39's SSM. Finally, we adapt an automatic circuit discovery tool, positional Edge Attribution Patching, to identify a Mamba IOI circuit. Our contributions provide initial evidence that circuit-based mechanistic interpretability tools work well for the Mamba architecture.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Ensign and Garriga-Alonso. "Investigating the Indirect Object Identification Circuit in Mamba." ICML 2024 Workshops: MI, 2024.

Markdown

[Ensign and Garriga-Alonso. "Investigating the Indirect Object Identification Circuit in Mamba." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/ensign2024icmlw-investigating/)

BibTeX

@inproceedings{ensign2024icmlw-investigating,
  title     = {{Investigating the Indirect Object Identification Circuit in Mamba}},
  author    = {Ensign, Danielle and Garriga-Alonso, Adrià},
  booktitle = {ICML 2024 Workshops: MI},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/ensign2024icmlw-investigating/}
}