Clustering Using Approximate Nearest Neighbour Oracles

Ullah, Enayat; Lang, Harry; Arora, Raman; Braverman, Vladimir

Clustering Using Approximate Nearest Neighbour Oracles

Enayat Ullah, Harry Lang, Raman Arora, Vladimir Braverman

TMLR 2023

/tmlr/2023/ullah2023tmlr-clustering/

Abstract

We study the problem of clustering data points in a streaming setting when one has access to the geometry of the space only via approximate nearest neighbour (ANN) oracles. In this setting, we present algorithms for streaming $O(1)$-approximate $k$-median clustering and its (streaming) coreset construction. In certain domains of interest, such as spaces with constant expansion, our algorithms improve upon the best-known runtime of both these problems. Furthermore, our results extend to cost functions satisfying the approximate triangle inequality, which subsumes $k$-means clustering and $M$-estimators. Finally, we run experiments on Census1990 dataset wherein the results empirically support our theory.

PDF TMLR Semantic Scholar

Cite

Text

Ullah et al. "Clustering Using Approximate Nearest Neighbour Oracles." Transactions on Machine Learning Research, 2023.

Markdown

[Ullah et al. "Clustering Using Approximate Nearest Neighbour Oracles." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/ullah2023tmlr-clustering/)

BibTeX

@article{ullah2023tmlr-clustering,
  title     = {{Clustering Using Approximate Nearest Neighbour Oracles}},
  author    = {Ullah, Enayat and Lang, Harry and Arora, Raman and Braverman, Vladimir},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/ullah2023tmlr-clustering/}
}