Clustering Using Approximate Nearest Neighbour Oracles

Abstract

We study the problem of clustering data points in a streaming setting when one has access to the geometry of the space only via approximate nearest neighbour (ANN) oracles. In this setting, we present algorithms for streaming $O(1)$-approximate $k$-median clustering and its (streaming) coreset construction. In certain domains of interest, such as spaces with constant expansion, our algorithms improve upon the best-known runtime of both these problems. Furthermore, our results extend to cost functions satisfying the approximate triangle inequality, which subsumes $k$-means clustering and $M$-estimators. Finally, we run experiments on Census1990 dataset wherein the results empirically support our theory.

Cite

Text

Ullah et al. "Clustering Using Approximate Nearest Neighbour Oracles." Transactions on Machine Learning Research, 2023.

Markdown

[Ullah et al. "Clustering Using Approximate Nearest Neighbour Oracles." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/ullah2023tmlr-clustering/)

BibTeX

@article{ullah2023tmlr-clustering,
  title     = {{Clustering Using Approximate Nearest Neighbour Oracles}},
  author    = {Ullah, Enayat and Lang, Harry and Arora, Raman and Braverman, Vladimir},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/ullah2023tmlr-clustering/}
}