Hierarchical Clustering Beyond the Worst-Case

Abstract

Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis. Although hierarchical clustering has mostly been studied through procedures such as linkage algorithms, or top-down heuristics, rather than as optimization problems, recently Dasgupta [1] proposed an objective function for hierarchical clustering and initiated a line of work developing algorithms that explicitly optimize an objective (see also [2, 3, 4]). In this paper, we consider a fairly general random graph model for hierarchical clustering, called the hierarchical stochastic blockmodel (HSBM), and show that in certain regimes the SVD approach of McSherry [5] combined with specific linkage methods results in a clustering that give an O(1)-approximation to Dasgupta’s cost function. We also show that an approach based on SDP relaxations for balanced cuts based on the work of Makarychev et al. [6], combined with the recursive sparsest cut algorithm of Dasgupta, yields an O(1) approximation in slightly larger regimes and also in the semi-random setting, where an adversary may remove edges from the random graph generated according to an HSBM. Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.

Cite

Text

Cohen-Addad et al. "Hierarchical Clustering Beyond the Worst-Case." Neural Information Processing Systems, 2017.

Markdown

[Cohen-Addad et al. "Hierarchical Clustering Beyond the Worst-Case." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/cohenaddad2017neurips-hierarchical/)

BibTeX

@inproceedings{cohenaddad2017neurips-hierarchical,
  title     = {{Hierarchical Clustering Beyond the Worst-Case}},
  author    = {Cohen-Addad, Vincent and Kanade, Varun and Mallmann-Trenn, Frederik},
  booktitle = {Neural Information Processing Systems},
  year      = {2017},
  pages     = {6201-6209},
  url       = {https://mlanthology.org/neurips/2017/cohenaddad2017neurips-hierarchical/}
}