On Coresets for Clustering in Small Dimensional Euclidean Spaces

Abstract

We consider the problem of constructing small coresets for $k$-Median in Euclidean spaces. Given a large set of data points $P\subset \mathbb{R}^d$, a coreset is a much smaller set $S\subset \mathbb{R}^d$, so that the $k$-Median costs of any $k$ centers w.r.t. $P$ and $S$ are close. Existing literature mainly focuses on the high-dimension case and there has been great success in obtaining dimension-independent bounds, whereas the case for small $d$ is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for $k$-Median in small dimensions. For small $d$, a natural question is whether existing near-optimal dimension-independent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters. Moreover, new lower bound results are also proved, which are the highest for small $d$. In particular, we completely settle the coreset size bound for $1$-d $k$-Median (up to log factors). Interestingly, our results imply a strong separation between $1$-d $1$-Median and $1$-d $2$-Median. As far as we know, this is the first such separation between $k=1$ and $k=2$ in any dimension.

Cite

Text

Huang et al. "On Coresets for Clustering in Small Dimensional Euclidean Spaces." International Conference on Machine Learning, 2023.

Markdown

[Huang et al. "On Coresets for Clustering in Small Dimensional Euclidean Spaces." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/huang2023icml-coresets/)

BibTeX

@inproceedings{huang2023icml-coresets,
  title     = {{On Coresets for Clustering in Small Dimensional Euclidean Spaces}},
  author    = {Huang, Lingxiao and Huang, Ruiyuan and Huang, Zengfeng and Wu, Xuan},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {13891-13915},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/huang2023icml-coresets/}
}