Including Multi-Feature Interactions and Redundancy for Feature Ranking in Mixed Datasets
Abstract
Feature ranking is beneficial to gain knowledge and to identify the relevant features from a high-dimensional dataset. However, in several datasets, few features by itself might have small correlation with the target classes, but by combining these features with some other features, they can be strongly correlated with the target. This means that multiple features exhibit interactions among themselves. It is necessary to rank the features based on these interactions for better analysis and classifier performance. However, evaluating these interactions on large datasets is computationally challenging. Furthermore, datasets often have features with redundant information. Using such redundant features hinders both efficiency and generalization capability of the classifier. The major challenge is to efficiently rank the features based on relevance and redundance on mixed datasets. In this work, we propose a filter-based framework based on R elevance a nd R edundancy (RaR), RaR computes a single score that quantifies the feature relevance by considering interactions between features and redundancy. The top ranked features of RaR are characterized by maximum relevance and non-redundance. The evaluation on synthetic and real world datasets demonstrates that our approach outperforms several state-of-the-art feature selection techniques. Code and data related to this chapter are available at: https://doi.org/10.6084/m9.figshare.5418706 .
Cite
Text
Shekar et al. "Including Multi-Feature Interactions and Redundancy for Feature Ranking in Mixed Datasets." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017. doi:10.1007/978-3-319-71249-9_15Markdown
[Shekar et al. "Including Multi-Feature Interactions and Redundancy for Feature Ranking in Mixed Datasets." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017.](https://mlanthology.org/ecmlpkdd/2017/shekar2017ecmlpkdd-including/) doi:10.1007/978-3-319-71249-9_15BibTeX
@inproceedings{shekar2017ecmlpkdd-including,
title = {{Including Multi-Feature Interactions and Redundancy for Feature Ranking in Mixed Datasets}},
author = {Shekar, Arvind Kumar and Bocklisch, Tom and Sánchez, Patricia Iglesias and Straehle, Christoph Nikolas and Müller, Emmanuel},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2017},
pages = {239-255},
doi = {10.1007/978-3-319-71249-9_15},
url = {https://mlanthology.org/ecmlpkdd/2017/shekar2017ecmlpkdd-including/}
}