Bayesian Sparse Gaussian Mixture Model for Clustering in High Dimensions

Abstract

We study the sparse high-dimensional Gaussian mixture model when the number of clusters is allowed to grow with the sample size. A minimax lower bound for parameter estimation is established, and we show that a constrained maximum likelihood estimator achieves the minimax lower bound. However, this optimization-based estimator is computationally intractable because the objective function is highly nonconvex and the feasible set involves discrete structures. To address the computational challenge, we propose a computationally tractable Bayesian approach to estimate high-dimensional Gaussian mixtures whose cluster centers exhibit sparsity using a continuous spike-and-slab prior. We further prove that the posterior contraction rate of the proposed Bayesian method is minimax optimal. The mis- clustering rate is obtained as a by-product using tools from matrix perturbation theory. The proposed Bayesian sparse Gaussian mixture model does not require pre-specifying the number of clusters, which can be adaptively estimated. The validity and usefulness of the proposed method is demonstrated through simulation studies and the analysis of a real-world single-cell RNA sequencing data set.

Cite

Text

Yao et al. "Bayesian Sparse Gaussian Mixture Model for Clustering in High Dimensions." Journal of Machine Learning Research, 2025.

Markdown

[Yao et al. "Bayesian Sparse Gaussian Mixture Model for Clustering in High Dimensions." Journal of Machine Learning Research, 2025.](https://mlanthology.org/jmlr/2025/yao2025jmlr-bayesian/)

BibTeX

@article{yao2025jmlr-bayesian,
  title     = {{Bayesian Sparse Gaussian Mixture Model for Clustering in High Dimensions}},
  author    = {Yao, Dapeng and Xie, Fangzheng and Xu, Yanxun},
  journal   = {Journal of Machine Learning Research},
  year      = {2025},
  pages     = {1-50},
  volume    = {26},
  url       = {https://mlanthology.org/jmlr/2025/yao2025jmlr-bayesian/}
}