Hierarchical Clustering: $o(1)$-Approximation for Well-Clustered Graphs
Abstract
Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta, and present two polynomial-time approximation algorithms: Our first result is an $O(1)$-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature. Our second and main result is an $O(1)$-approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous state-of-the-art, which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure.
Cite
Text
Manghiuc and Sun. "Hierarchical Clustering: $o(1)$-Approximation for Well-Clustered Graphs." Neural Information Processing Systems, 2021.Markdown
[Manghiuc and Sun. "Hierarchical Clustering: $o(1)$-Approximation for Well-Clustered Graphs." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/manghiuc2021neurips-hierarchical/)BibTeX
@inproceedings{manghiuc2021neurips-hierarchical,
title = {{Hierarchical Clustering: $o(1)$-Approximation for Well-Clustered Graphs}},
author = {Manghiuc, Bogdan-Adrian and Sun, He},
booktitle = {Neural Information Processing Systems},
year = {2021},
url = {https://mlanthology.org/neurips/2021/manghiuc2021neurips-hierarchical/}
}