Rethinking Unsupervised Feature Selection: From Pseudo Labels to Pseudo Must-Links

Abstract

High-dimensional data are prevalent in various machine learning applications. Feature selection is a useful technique for alleviating the curse of dimensionality. Unsupervised feature selection problem tends to be more challenging than its supervised counterpart due to the lack of class labels. State-of-the-art approaches usually use the concept of pseudo labels to select discriminative features by their regression coefficients and the pseudo-labels derived from clustering is usually inaccurate. In this paper, we propose a new perspective for unsupervised feature selection by Discriminatively Exploiting Similarity (DES). Through forming similar and dissimilar data pairs, implicit discriminative information can be exploited. The similar/dissimilar relationship of data pairs can be used as guidance for feature selection. Based on this idea, we propose hypothesis testing based and classification based methods as instantiations of the DES framework. We evaluate the proposed approaches extensively using six real-world datasets. Experimental results demonstrate that our approaches achieve significantly outperforms the state-of-the-art unsupervised methods. More surprisingly, our unsupervised method even achieves performance comparable to a supervised feature selection method. Code related to this chapter is available at: http://bdsc.lab.uic.edu/resources.html .

Cite

Text

Wei et al. "Rethinking Unsupervised Feature Selection: From Pseudo Labels to Pseudo Must-Links." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017. doi:10.1007/978-3-319-71249-9_17

Markdown

[Wei et al. "Rethinking Unsupervised Feature Selection: From Pseudo Labels to Pseudo Must-Links." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017.](https://mlanthology.org/ecmlpkdd/2017/wei2017ecmlpkdd-rethinking/) doi:10.1007/978-3-319-71249-9_17

BibTeX

@inproceedings{wei2017ecmlpkdd-rethinking,
  title     = {{Rethinking Unsupervised Feature Selection: From Pseudo Labels to Pseudo Must-Links}},
  author    = {Wei, Xiaokai and Xie, Sihong and Cao, Bokai and Yu, Philip S.},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2017},
  pages     = {272-287},
  doi       = {10.1007/978-3-319-71249-9_17},
  url       = {https://mlanthology.org/ecmlpkdd/2017/wei2017ecmlpkdd-rethinking/}
}