Registers in Small Vision Transformers: A Reproducibility Study of Vision Transformers Need Registers

Abstract

Recent work has shown that Vision Transformers (ViTs) can produce “high-norm” artifact tokens in attention maps. These artifacts disproportionately accumulate global information, can degrade performance, and reduce interpretability in these models. Darcet et al. (2024) proposed registers—auxiliary learnable tokens—to mitigate these artifacts. In this reproducibility study, we verify whether these improvements extend to smaller ViTs. Specifically, we examine whether high-norm tokens appear in a DeiT-III Small model, whether registers reduce these artifacts, and how registers influence local and global feature representation. Our results confirm that smaller ViTs also exhibit high-norm tokens and registers partially alleviate them, improving interpretability. Although the overall performance gains are modest, these findings reinforce the utility of registers in enhancing ViTs while highlighting open questions about their varying effectiveness across different inputs and tasks. Our code is available at https://github.com/SnorrenanxD/regs-small-vits.

Cite

Text

Bach et al. "Registers in Small Vision Transformers: A Reproducibility Study of Vision Transformers Need Registers." Transactions on Machine Learning Research, 2025.

Markdown

[Bach et al. "Registers in Small Vision Transformers: A Reproducibility Study of Vision Transformers Need Registers." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/bach2025tmlr-registers/)

BibTeX

@article{bach2025tmlr-registers,
  title     = {{Registers in Small Vision Transformers: A Reproducibility Study of Vision Transformers Need Registers}},
  author    = {Bach, Linus Ruben and Bakker, Emma and van Dijk, Rénan and de Vries, Jip and Szewczyk, Konrad},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/bach2025tmlr-registers/}
}