T-SNE Exaggerates Clusters, Provably

Abstract

Central to the widespread use of t-distributed stochastic neighbor embedding (t-SNE) is the conviction that it produces visualizations whose structure roughly matches that of the input. To the contrary, we prove that (1) the strength of the input clustering, and (2) the extremity of outlier points, cannot be reliably inferred from the t-SNE output. We demonstrate the prevalence of these failure modes in practice as well.

Cite

Text

Bergam et al. "T-SNE Exaggerates Clusters, Provably." International Conference on Learning Representations, 2026.

Markdown

[Bergam et al. "T-SNE Exaggerates Clusters, Provably." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/bergam2026iclr-tsne/)

BibTeX

@inproceedings{bergam2026iclr-tsne,
  title     = {{T-SNE Exaggerates Clusters, Provably}},
  author    = {Bergam, Noah and Snoeck, Szymon and Verma, Nakul},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/bergam2026iclr-tsne/}
}