Hierarchical Clustering in General Metric Spaces Using Approximate Nearest Neighbors

Abstract

Hierarchical clustering is a widely used data analysis method, but suffers from scalability issues, requiring quadratic time in general metric spaces. In this work, we demonstrate how approximate nearest neighbor (ANN) queries can be used to improve the running time of the popular single-linkage and average-linkage methods. Our proposed algorithms are the first subquadratic time algorithms for non-Euclidean metrics. We complement our theoretical analysis with an empirical evaluation showcasing our methods’ efficiency and accuracy.

Cite

Text

Moseley et al. "Hierarchical Clustering in General Metric Spaces Using Approximate Nearest Neighbors." Artificial Intelligence and Statistics, 2021.

Markdown

[Moseley et al. "Hierarchical Clustering in General Metric Spaces Using Approximate Nearest Neighbors." Artificial Intelligence and Statistics, 2021.](https://mlanthology.org/aistats/2021/moseley2021aistats-hierarchical/)

BibTeX

@inproceedings{moseley2021aistats-hierarchical,
  title     = {{Hierarchical Clustering in General Metric Spaces Using Approximate Nearest Neighbors}},
  author    = {Moseley, Benjamin and Vassilvtiskii, Sergei and Wang, Yuyan},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2021},
  pages     = {2440-2448},
  volume    = {130},
  url       = {https://mlanthology.org/aistats/2021/moseley2021aistats-hierarchical/}
}