Diversity-Aware K-Median: Clustering with Fair Center Representation
Abstract
. We introduce a novel problem for diversity-aware clustering. We assume that the potential cluster centers belong to a set of groups defined by protected attributes, such as ethnicity, gender, etc. We then ask to find a minimum-cost clustering of the data into k clusters so that a specified minimum number of cluster centers are chosen from each group. We thus require that all groups are represented in the clustering solution as cluster centers, according to specified requirements. More precisely, we are given a set of clients C , a set of facilities F , a collection F = F 1 , . . . , F t of facility groups F i ⊆ F , budget k , and a set of lower-bound thresholds R = r 1 , . . . , r t , one for each group in F . The diversity-aware k -median problem asks to find a set S of k facilities in F such that | S ∩ F i | ≥ r i , that is, at least r i centers in S are from group F i , and the k -median cost (cid:80) c ∈ C min s ∈ S d ( c, s ) is minimized. We show that in the general case where the facility groups may overlap, the diversity-aware k -median problem is NP -hard, fixed-parameter intractable, and inapproximable to any multiplicative factor. On the other hand, when the facility groups are disjoint, approximation algorithms can be obtained by reduction to the matroid median and red-blue median problems. Experi-mentally, we evaluate our approximation methods for the tractable cases, and present a relaxation-based heuristic for the theoretically intractable case, which can provide high-quality and efficient solutions for real-world datasets. fixed-parameter intractable, and inapproximable to any multiplicative factor. Despite such neg-ative results we show that the variant of the problem with disjoint facility types can be approximated efficiently. We also present heuristic algorithms that practically solve real-world problem instances and empirically evaluated the proposed solutions using an extensive set of experiments. The main open problem the complexity of the approximation algorithm, in the setting of and t not linear for obtaining exact solutions, again in the case of disjoint groups.
Cite
Text
Thejaswi et al. "Diversity-Aware K-Median: Clustering with Fair Center Representation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86520-7_47Markdown
[Thejaswi et al. "Diversity-Aware K-Median: Clustering with Fair Center Representation." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/thejaswi2021ecmlpkdd-diversityaware/) doi:10.1007/978-3-030-86520-7_47BibTeX
@inproceedings{thejaswi2021ecmlpkdd-diversityaware,
title = {{Diversity-Aware K-Median: Clustering with Fair Center Representation}},
author = {Thejaswi, Suhas and Ordozgoiti, Bruno and Gionis, Aristides},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2021},
pages = {765-780},
doi = {10.1007/978-3-030-86520-7_47},
url = {https://mlanthology.org/ecmlpkdd/2021/thejaswi2021ecmlpkdd-diversityaware/}
}