Token Alignment Heads: Unveiling Attention's Role in LLM Multilingual Translation

Abstract

Recently, large language models (LLMs) have made remarkable progress, with multilingual capability emerging as a core foundational strengths. However, the internal mechanisms by which these models perform translation remain incompletely understood. In this paper, we elucidate the relationship between the attention mechanism in LLMs and their translation abilities. We find that certain attention heads, which we term token alignment heads, are specifically responsible for mapping tokens from the source language to the target language during inference. Through a systematic investigation across various models, we confirm that these token alignment heads exhibit several key characteristics: (1) Universality: They are present in all LLMs we studied. (2) Sparsity: They constitute only a small fraction of all attention heads. (3) Consistency: The set of token alignment heads activated by the model shows strong consistency across different language pairs. (4) Causality: Interventionally removing these heads leads to a sharp decline in the model's translation performance, while randomly removing non-token alignment heads has little impact on translation ability. (5) Functional Specificity: Ablating token alignment heads disproportionately harms translation but has a varied impact on other multilingual tasks. We also traced the formation of token alignment heads during pre-training, revealing an evolutionary path of rapid proliferation, stabilization, and eventual pruning. Furthermore we leverage these token alignment heads to filter multilingual training data, and our experiments show that these data could enhance translation capabilities of the models.

Cite

Text

Binbinliu et al. "Token Alignment Heads: Unveiling Attention's Role in LLM Multilingual Translation." International Conference on Learning Representations, 2026.

Markdown

[Binbinliu et al. "Token Alignment Heads: Unveiling Attention's Role in LLM Multilingual Translation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/binbinliu2026iclr-token/)

BibTeX

@inproceedings{binbinliu2026iclr-token,
  title     = {{Token Alignment Heads: Unveiling Attention's Role in LLM Multilingual Translation}},
  author    = {Binbinliu,  and Han, Wenhan and Chen, Feng and Zhang, Yifan and Guo, Ping and Lin, Haobin and Zhang, Bingni and Wang, Taifeng and Zheng, Yin},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/binbinliu2026iclr-token/}
}