Attention-Based Clustering

Abstract

Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to automatically extract structure from data in an unsupervised setting. In particular, we demonstrate their suitability for clustering when the input data is generated from a Gaussian mixture model. To this end, we study a simplified two-head attention layer and define a population risk whose minimization with unlabeled data drives the head parameters to align with the true mixture centroids.

Cite

Text

Maulen-Soto et al. "Attention-Based Clustering." Advances in Neural Information Processing Systems, 2025.

Markdown

[Maulen-Soto et al. "Attention-Based Clustering." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/maulensoto2025neurips-attentionbased/)

BibTeX

@inproceedings{maulensoto2025neurips-attentionbased,
  title     = {{Attention-Based Clustering}},
  author    = {Maulen-Soto, Rodrigo and Marion, Pierre and Boyer, Claire},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/maulensoto2025neurips-attentionbased/}
}