Clustering Using Approximate Nearest Neighbour Oracles
Abstract
We study the problem of clustering data points in a streaming setting when one has access to the geometry of the space only via approximate nearest neighbour (ANN) oracles. In this setting, we present algorithms for streaming $O(1)$-approximate $k$-median clustering and its (streaming) coreset construction. In certain domains of interest, such as spaces with constant expansion, our algorithms improve upon the best-known runtime of both these problems. Furthermore, our results extend to cost functions satisfying the approximate triangle inequality, which subsumes $k$-means clustering and $M$-estimators. Finally, we run experiments on Census1990 dataset wherein the results empirically support our theory.
Cite
Text
Ullah et al. "Clustering Using Approximate Nearest Neighbour Oracles." Transactions on Machine Learning Research, 2023.Markdown
[Ullah et al. "Clustering Using Approximate Nearest Neighbour Oracles." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/ullah2023tmlr-clustering/)BibTeX
@article{ullah2023tmlr-clustering,
title = {{Clustering Using Approximate Nearest Neighbour Oracles}},
author = {Ullah, Enayat and Lang, Harry and Arora, Raman and Braverman, Vladimir},
journal = {Transactions on Machine Learning Research},
year = {2023},
url = {https://mlanthology.org/tmlr/2023/ullah2023tmlr-clustering/}
}