Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution

Abstract

Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality. ICML Proceedings of the Twentieth International Conference on Machine Learning

Cite

Text

Yu and Liu. "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution." International Conference on Machine Learning, 2003.

Markdown

[Yu and Liu. "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution." International Conference on Machine Learning, 2003.](https://mlanthology.org/icml/2003/yu2003icml-feature/)

BibTeX

@inproceedings{yu2003icml-feature,
  title     = {{Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution}},
  author    = {Yu, Lei and Liu, Huan},
  booktitle = {International Conference on Machine Learning},
  year      = {2003},
  pages     = {856-863},
  url       = {https://mlanthology.org/icml/2003/yu2003icml-feature/}
}