On the Similarity of Circuits Across Languages: A Case Study on the Subject-Verb Agreement Task

Abstract

Several algorithms implemented by language models have recently been successfully reversed-engineered. However, these findings have been concentrated on specific tasks and models, leaving it unclear how \textit{universal} circuits are across different settings. In this paper, we study the circuits implemented by Gemma 2B for solving the subject-verb agreement task across two different languages, English and Spanish. We discover that both circuits are highly consistent, being mainly driven by a particular attention head writing a `subject number' signal to the last residual stream, which is read by a small set of neurons in the final MLP layers. Notably, this subject number signal is represented as a direction in the residual stream space, and is language-independent. Finally, we demonstrate this direction has a causal effect on the model predictions, effectively flipping the Spanish predicted verb number by intervening with the direction found in English examples.

Cite

Text

Ferrando and Costa-jussà. "On the Similarity of Circuits Across Languages: A Case Study on the Subject-Verb Agreement Task." ICML 2024 Workshops: MI, 2024.

Markdown

[Ferrando and Costa-jussà. "On the Similarity of Circuits Across Languages: A Case Study on the Subject-Verb Agreement Task." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/ferrando2024icmlw-similarity/)

BibTeX

@inproceedings{ferrando2024icmlw-similarity,
  title     = {{On the Similarity of Circuits Across Languages: A Case Study on the Subject-Verb Agreement Task}},
  author    = {Ferrando, Javier and Costa-jussà, Marta R.},
  booktitle = {ICML 2024 Workshops: MI},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/ferrando2024icmlw-similarity/}
}