Near-Optimal Algorithms for Explainable K-Medians and K-Means

Abstract

We consider the problem of explainable $k$-medians and $k$-means introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). In this problem, our goal is to find a \emph{threshold decision tree} that partitions data into $k$ clusters and minimizes the $k$-medians or $k$-means objective. The obtained clustering is easy to interpret because every decision node of a threshold tree splits data based on a single feature into two groups. We propose a new algorithm for this problem which is $\tilde O(\log k)$ competitive with $k$-medians with $\ell_1$ norm and $\tilde O(k)$ competitive with $k$-means. This is an improvement over the previous guarantees of $O(k)$ and $O(k^2)$ by Dasgupta et al (2020). We also provide a new algorithm which is $O(\log^{\nicefrac{3}{2}} k)$ competitive for $k$-medians with $\ell_2$ norm. Our first algorithm is near-optimal: Dasgupta et al (2020) showed a lower bound of $\Omega(\log k)$ for $k$-medians; in this work, we prove a lower bound of $\tilde\Omega(k)$ for $k$-means. We also provide a lower bound of $\Omega(\log k)$ for $k$-medians with $\ell_2$ norm.

Cite

Text

Makarychev and Shan. "Near-Optimal Algorithms for Explainable K-Medians and K-Means." International Conference on Machine Learning, 2021.

Markdown

[Makarychev and Shan. "Near-Optimal Algorithms for Explainable K-Medians and K-Means." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/makarychev2021icml-nearoptimal/)

BibTeX

@inproceedings{makarychev2021icml-nearoptimal,
  title     = {{Near-Optimal Algorithms for Explainable K-Medians and K-Means}},
  author    = {Makarychev, Konstantin and Shan, Liren},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {7358-7367},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/makarychev2021icml-nearoptimal/}
}