On the Similarity of Circuits Across Languages: A Case Study on the Subject-Verb Agreement Task
Abstract
Several algorithms implemented by language models have recently been successfully reversed-engineered. However, these findings have been concentrated on specific tasks and models, leaving it unclear how \textit{universal} circuits are across different settings. In this paper, we study the circuits implemented by Gemma 2B for solving the subject-verb agreement task across two different languages, English and Spanish. We discover that both circuits are highly consistent, being mainly driven by a particular attention head writing a `subject number' signal to the last residual stream, which is read by a small set of neurons in the final MLP layers. Notably, this subject number signal is represented as a direction in the residual stream space, and is language-independent. Finally, we demonstrate this direction has a causal effect on the model predictions, effectively flipping the Spanish predicted verb number by intervening with the direction found in English examples.
Cite
Text
Ferrando and Costa-jussà. "On the Similarity of Circuits Across Languages: A Case Study on the Subject-Verb Agreement Task." ICML 2024 Workshops: MI, 2024.Markdown
[Ferrando and Costa-jussà. "On the Similarity of Circuits Across Languages: A Case Study on the Subject-Verb Agreement Task." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/ferrando2024icmlw-similarity/)BibTeX
@inproceedings{ferrando2024icmlw-similarity,
title = {{On the Similarity of Circuits Across Languages: A Case Study on the Subject-Verb Agreement Task}},
author = {Ferrando, Javier and Costa-jussà, Marta R.},
booktitle = {ICML 2024 Workshops: MI},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/ferrando2024icmlw-similarity/}
}