Clustering High-Dimensional Data with Ordered Weighted $\ell_1$ Regularization

Abstract

Clustering complex high-dimensional data is particularly challenging as the signal-to-noise ratio in such data is significantly lower than their classical counterparts. This is mainly because most of the features describing a data point have little to no information about the natural grouping of the data. Filtering such features is, thus, critical in harnessing meaningful information from such large-scale data. Many recent methods have attempted to find feature importance in a centroid-based clustering setting. Though empirically successful in classical low-dimensional settings, most perform poorly, especially on microarray and single-cell RNA-seq data. This paper extends the merits of weighted center-based clustering through the Ordered Weighted $\ell_1$ (OWL) norm for better feature selection. Appealing to the elegant properties of block coordinate-descent and Frank-Wolf algorithms, we are not only able to maintain computational efficiency but also able to outperform the state-of-the-art in high-dimensional settings. The proposal also comes with finite sample theoretical guarantees, including a rate of $\mathcal{O}\left(\sqrt{k \log p/n}\right)$, under model-sparsity, bridging the gap between theory and practice of weighted clustering.

Cite

Text

Chakraborty et al. "Clustering High-Dimensional Data with Ordered Weighted $\ell_1$ Regularization." Artificial Intelligence and Statistics, 2023.

Markdown

[Chakraborty et al. "Clustering High-Dimensional Data with Ordered Weighted $\ell_1$ Regularization." Artificial Intelligence and Statistics, 2023.](https://mlanthology.org/aistats/2023/chakraborty2023aistats-clustering/)

BibTeX

@inproceedings{chakraborty2023aistats-clustering,
  title     = {{Clustering High-Dimensional Data with Ordered Weighted $\ell_1$ Regularization}},
  author    = {Chakraborty, Chandramauli and Paul, Sayan and Chakraborty, Saptarshi and Das, Swagatam},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2023},
  pages     = {7176-7189},
  volume    = {206},
  url       = {https://mlanthology.org/aistats/2023/chakraborty2023aistats-clustering/}
}