New Coresets for Projective Clustering and Applications

Abstract

$(j,k)$-projective clustering is the natural generalization of the family of $k$-clustering and $j$-subspace clustering problems. Given a set of points $P$ in $\mathbb{R}^d$, the goal is to find $k$ flats of dimension $j$, i.e., affine subspaces, that best fit $P$ under a given distance measure. In this paper, we propose the first algorithm that returns an $L_\infty$ coreset of size polynomial in $d$. Moreover, we give the first strong coreset construction for general $M$-estimator regression. Specifically, we show that our construction provides efficient coreset constructions for Cauchy, Welsch, Huber, Geman-McClure, Tukey, $L_1-L_2$, and Fair regression, as well as general concave and power-bounded loss functions. Finally, we provide experimental results based on real-world datasets, showing the efficacy of our approach.

Cite

Text

Tukan et al. "New Coresets for Projective Clustering and Applications." Artificial Intelligence and Statistics, 2022.

Markdown

[Tukan et al. "New Coresets for Projective Clustering and Applications." Artificial Intelligence and Statistics, 2022.](https://mlanthology.org/aistats/2022/tukan2022aistats-new/)

BibTeX

@inproceedings{tukan2022aistats-new,
  title     = {{New Coresets for Projective Clustering and Applications}},
  author    = {Tukan, Murad and Wu, Xuan and Zhou, Samson and Braverman, Vladimir and Feldman, Dan},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2022},
  pages     = {5391-5415},
  volume    = {151},
  url       = {https://mlanthology.org/aistats/2022/tukan2022aistats-new/}
}