Finding Low Error Clusterings
Abstract
A common approach for solving clustering problems is to design algorithms to approximately optimize various objective functions (e.g., k-means or min-sum) defined in terms of some given pairwise distance or similarity information. However, in many learning motivated clustering applications (such as clustering proteins by function) there is some unknown target clustering; in such cases the pairwise information is merely based on heuristics and the real goal is to achieve low error on the data. In these settings, an arbitrary c-approximation algorithm for some objective would work well only if any c-approximation to that objective is close to the target clustering. In recent work, Balcan et. al [7] have shown how both for the k-means and k-median objectives this property allows one to produce clusterings of low error, even for values c such that getting a c-approximation to these objective functions is provably NP-hard. In this paper we analyze the min-sum objective from this perspective. While [7] also considered the min-sum problem, the results they derived for this objective were substantially weaker. In this work we derive new and more subtle structural properties for min-sum in this context and use these to design efficient algorithms for producing accurate clusterings, both in the transductive and in the inductive case. We also analyze the correlation clustering problem from this perspective, and point out interesting differences between this objective and k-median, k-means, or min-sum objectives.
Cite
Text
Balcan and Braverman. "Finding Low Error Clusterings." Annual Conference on Computational Learning Theory, 2009.Markdown
[Balcan and Braverman. "Finding Low Error Clusterings." Annual Conference on Computational Learning Theory, 2009.](https://mlanthology.org/colt/2009/balcan2009colt-finding/)BibTeX
@inproceedings{balcan2009colt-finding,
title = {{Finding Low Error Clusterings}},
author = {Balcan, Maria-Florina and Braverman, Mark},
booktitle = {Annual Conference on Computational Learning Theory},
year = {2009},
url = {https://mlanthology.org/colt/2009/balcan2009colt-finding/}
}