Clustering via Self-Supervised Diffusion

Abstract

Diffusion models, widely recognized for their success in generative tasks, have not yet been applied to clustering. We introduce Clustering via Diffusion (CLUDI), a self-supervised framework that combines the generative power of diffusion models with pre-trained Vision Transformer features to achieve robust and accurate clustering. CLUDI is trained via a teacher–student paradigm: the teacher uses stochastic diffusion-based sampling to produce diverse cluster assignments, which the student refines into stable predictions. This stochasticity acts as a novel data augmentation strategy, enabling CLUDI to uncover intricate structures in high-dimensional data. Extensive evaluations on challenging datasets demonstrate that CLUDI achieves state-of-the-art performance in unsupervised classification, setting new benchmarks in clustering robustness and adaptability to complex data distributions.

Cite

Text

Uziel et al. "Clustering via Self-Supervised Diffusion." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Uziel et al. "Clustering via Self-Supervised Diffusion." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/uziel2025icml-clustering/)

BibTeX

@inproceedings{uziel2025icml-clustering,
  title     = {{Clustering via Self-Supervised Diffusion}},
  author    = {Uziel, Roy and Chelly, Irit and Freifeld, Oren and Pakman, Ari},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {60711-60726},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/uziel2025icml-clustering/}
}