Penalized Model-Based Clustering with Application to Variable Selection

Abstract

Variable selection in clustering analysis is both challenging and important. In the context of model-based clustering analysis with a common diagonal covariance matrix, which is especially suitable for "high dimension, low sample size" settings, we propose a penalized likelihood approach with an L1 penalty function, automatically realizing variable selection via thresholding and delivering a sparse solution. We derive an EM algorithm to fit our proposed model, and propose a modified BIC as a model selection criterion to choose the number of components and the penalization parameter. A simulation study and an application to gene function prediction with gene expression profiles demonstrate the utility of our method.

Cite

Text

Pan and Shen. "Penalized Model-Based Clustering with Application to Variable Selection." Journal of Machine Learning Research, 2007.

Markdown

[Pan and Shen. "Penalized Model-Based Clustering with Application to Variable Selection." Journal of Machine Learning Research, 2007.](https://mlanthology.org/jmlr/2007/pan2007jmlr-penalized/)

BibTeX

@article{pan2007jmlr-penalized,
  title     = {{Penalized Model-Based Clustering with Application to Variable Selection}},
  author    = {Pan, Wei and Shen, Xiaotong},
  journal   = {Journal of Machine Learning Research},
  year      = {2007},
  pages     = {1145-1164},
  volume    = {8},
  url       = {https://mlanthology.org/jmlr/2007/pan2007jmlr-penalized/}
}