Scalable Fair Clustering

Abstract

We study the fair variant of the classic k-median problem introduced by (Chierichetti et al., NeurIPS 2017) in which the points are colored, and the goal is to minimize the same average distance objective as in the standard $k$-median problem while ensuring that all clusters have an “approximately equal” number of points of each color. (Chierichetti et al., NeurIPS 2017) proposed a two-phase algorithm for fair $k$-clustering. In the first step, the pointset is partitioned into subsets called fairlets that satisfy the fairness requirement and approximately preserve the k-median objective. In the second step, fairlets are merged into k clusters by one of the existing k-median algorithms. The running time of this algorithm is dominated by the first step, which takes super-quadratic time. In this paper, we present a practical approximate fairlet decomposition algorithm that runs in nearly linear time.

Cite

Text

Backurs et al. "Scalable Fair Clustering." International Conference on Machine Learning, 2019.

Markdown

[Backurs et al. "Scalable Fair Clustering." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/backurs2019icml-scalable/)

BibTeX

@inproceedings{backurs2019icml-scalable,
  title     = {{Scalable Fair Clustering}},
  author    = {Backurs, Arturs and Indyk, Piotr and Onak, Krzysztof and Schieber, Baruch and Vakilian, Ali and Wagner, Tal},
  booktitle = {International Conference on Machine Learning},
  year      = {2019},
  pages     = {405-413},
  volume    = {97},
  url       = {https://mlanthology.org/icml/2019/backurs2019icml-scalable/}
}