Efficient Centroid-Linkage Clustering

Abstract

We give an algorithm for Centroid-Linkage Hierarchical Agglomerative Clustering (HAC), which computes a $c$-approximate clustering in roughly $n^{1+O(1/c^2)}$ time. We obtain our result by combining a new centroid-linkage HAC algorithm with a novel fully dynamic data structure for nearest neighbor search which works under adaptive updates.We also evaluate our algorithm empirically. By leveraging a state-of-the-art nearest-neighbor search library, we obtain a fast and accurate centroid-linkage HAC algorithm. Compared to an existing state-of-the-art exact baseline, our implementation maintains the clustering quality while delivering up to a $36\times$ speedup due to performing fewer distance comparisons.

Cite

Text

Bateni et al. "Efficient Centroid-Linkage Clustering." Neural Information Processing Systems, 2024. doi:10.52202/079017-1571

Markdown

[Bateni et al. "Efficient Centroid-Linkage Clustering." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/bateni2024neurips-efficient/) doi:10.52202/079017-1571

BibTeX

@inproceedings{bateni2024neurips-efficient,
  title     = {{Efficient Centroid-Linkage Clustering}},
  author    = {Bateni, MohammadHossein and Dhulipala, Laxman and Fletcher, Willem and Gowda, Kishen N. and Hershkowitz, D Ellis and Jayaram, Rajesh and Łącki, Jakub},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1571},
  url       = {https://mlanthology.org/neurips/2024/bateni2024neurips-efficient/}
}