Feature Subset Selection and Order Identification for Unsupervised Learning

Abstract

This paper explores the problem of feature subset selection for unsupervised learning within the wrapper framework. In particular, we examine feature subset selection wrapped around expectation-maximization (EM) clustering with order identification (identifying the number of clusters in the data). We investigate two different performance criteria for evaluating candidate feature subsets: scatter separability and maximum likelihood. When the "true" number of clusters k is unknown, our experiments on simulated Gaussian data and real data sets show that incorporating the search for k within the feature selection procedure obtains better "class" accuracy than fixing k to be the number of classes. There are two reasons: 1) the "true" number of Gaussian components is not necessarily equal to the number of classes and 2) clustering with different feature subsets can result in different numbers of "true" clusters. Our empirical evaluation shows that feature selection redu...

Cite

Text

Dy and Brodley. "Feature Subset Selection and Order Identification for Unsupervised Learning." International Conference on Machine Learning, 2000.

Markdown

[Dy and Brodley. "Feature Subset Selection and Order Identification for Unsupervised Learning." International Conference on Machine Learning, 2000.](https://mlanthology.org/icml/2000/dy2000icml-feature/)

BibTeX

@inproceedings{dy2000icml-feature,
  title     = {{Feature Subset Selection and Order Identification for Unsupervised Learning}},
  author    = {Dy, Jennifer G. and Brodley, Carla E.},
  booktitle = {International Conference on Machine Learning},
  year      = {2000},
  pages     = {247-254},
  url       = {https://mlanthology.org/icml/2000/dy2000icml-feature/}
}