Efficient Clustering with Limited Distance Information

Konstantin Voevodski, Maria-Florina Balcan, Heiko Röglin, Shang-Hua Teng, Yu Xia

UAI 2010 pp. 632-640

/uai/2010/voevodski2010uai-efficient/

Abstract

Given a point set S and an unknown metric d on S, we study the problem of efficiently partitioning S into k clusters while querying few distances between the points. In our model we assume that we have access to one versus all queries that given a point s in S return the distances between s and all other points. We show that given a natural assumption about the structure of the instance, we can efficiently find an accurate clustering using only O(k) distance queries. Our algorithm uses an active selection strategy to choose a small set of points that we call landmarks, and considers only the distances between landmarks and other points to produce a clustering. We use our algorithm to cluster proteins by sequence similarity. This setting nicely fits our model because we can use a fast sequence database search program to query a sequence against an entire dataset. We conduct an empirical study that shows that even though we query a small fraction of the distances between the points, we produce clusterings that are close to a desired clustering given by manual classification.

UAI Semantic Scholar

Cite

Text

Voevodski et al. "Efficient Clustering with Limited Distance Information." Conference on Uncertainty in Artificial Intelligence, 2010.

Markdown

[Voevodski et al. "Efficient Clustering with Limited Distance Information." Conference on Uncertainty in Artificial Intelligence, 2010.](https://mlanthology.org/uai/2010/voevodski2010uai-efficient/)

BibTeX

@inproceedings{voevodski2010uai-efficient,
  title     = {{Efficient Clustering with Limited Distance Information}},
  author    = {Voevodski, Konstantin and Balcan, Maria-Florina and Röglin, Heiko and Teng, Shang-Hua and Xia, Yu},
  booktitle = {Conference on Uncertainty in Artificial Intelligence},
  year      = {2010},
  pages     = {632-640},
  url       = {https://mlanthology.org/uai/2010/voevodski2010uai-efficient/}
}