Automated Attention Pattern Discovery at Scale in Large Language Models
Abstract
Large language models have found their success by scaling up their capabilities to work in general settings. The same can unfortunately not be said for their interpretability methods. The current trend in mechanistic interpretability is to provide precise explanations of specific behaviors in controlled settings. These often do not generalize well into other settings, or are too resource intensive for larger studies. In this work we propose to study repeated behaviors in large language models by mining completion scenarios in Java code datasets, through exploiting the structured nature of source code. We then collect the attention patterns generated in the attention heads to demonstrate that they are scalable signals for global interpretability of model components. We show that vision models offer a promising direction for analyzing attention patterns at scale. To demonstrate this, we introduce the Attention Pattern – Masked Autoencoder (AP-MAE), a vision transformer-based model that efficiently reconstructs masked attention patterns. Experiments on StarCoder2 models (3B–15B) show that AP-MAE (i) reconstructs masked attention patterns with high accuracy, (ii) generalizes across unseen models with minimal degradation, (iii) reveals recurring patterns across a large number of inferences, (iv) predicts whether a generation will be correct without access to ground truth, with accuracies ranging from 55% to 70% depending on the task, and (v) enables targeted interventions that increase accuracy by 13.6% when applied selectively, but cause rapid collapse when applied excessively. These results establish attention patterns as a scalable signal for interpretability and demonstrate that AP-MAE provides a transferable foundation for both analysis and intervention in large language models. Beyond its standalone value, AP-MAE can also serve as a selection procedure to guide more fine-grained mechanistic approaches toward the most relevant components. We release code and models to support future work in large-scale interpretability.
Cite
Text
Katzy et al. "Automated Attention Pattern Discovery at Scale in Large Language Models." Transactions on Machine Learning Research, 2026.Markdown
[Katzy et al. "Automated Attention Pattern Discovery at Scale in Large Language Models." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/katzy2026tmlr-automated/)BibTeX
@article{katzy2026tmlr-automated,
title = {{Automated Attention Pattern Discovery at Scale in Large Language Models}},
author = {Katzy, Jonathan and Popescu, Razvan Mihai and Mekkes, Erik and van Deursen, Arie and Izadi, Maliheh},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/katzy2026tmlr-automated/}
}