Improving Mutual Information Based Feature Selection by Boosting Unique Relevance
Abstract
Mutual Information (MI) based feature selection makes use of MI to evaluate each feature and eventually shortlists a relevant feature subset, in order to address issues associated with high-dimensional datasets. Despite the effectiveness of MI in feature selection, we notice that many state-of-the-art algorithms disregard the so-called unique relevance (UR) of features, which is a necessary condition for the optimal feature subset. In our study of five representative MI based feature selection (MIBFS) algorithms, we find that all of them underperform as they ignore the UR of features and arrive at a suboptimal selected feature subset. We point out that the heart of the problem is that all these MIBFS algorithms follow the criterion of Maximize Relevance with Minimum Redundancy (MRwMR), which does not explicitly target UR. This motivates us to augment the existing criterion with the objective of boosting unique relevance (BUR), leading to a new criterion called MRwMR-BUR. Depending on the task being addressed, MRwMR-BUR has two variants, termed MRwMR-BUR-KSG and MRwMR-BUR-CLF, which estimate UR differently. MRwMR-BUR-KSG estimates UR via a nearest-neighbor based approach called the KSG estimator and is designed for three major tasks: (i) Classification Performance (i.e., higher classification accuracy). (ii) Feature Interpretability (i.e., a more precise selected feature subset for practitioners to explore the hidden relationship between features and labels). (iii) Classifier Generalization (i.e., the selected feature subset generalizes well to various classifiers). MRwMR-BUR-CLF estimates UR via a classifier based approach. It adapts UR to different classifiers, further improving the competitiveness of MRwMR-BUR for classification performance oriented tasks. The performance of MRwMR-BUR-KSG and MRwMR-BUR-CLF is validated via experiments using six public datasets and four popular classifiers. Specifically, as compared to MRwMR, the proposed MRwMR-BUR-KSG improves the test accuracy by 2% – 3% with 25% – 30% fewer features being selected, without increasing the algorithm complexity. MRwMR-BUR-CLF further improves the classification performance by 3.8% – 5.5% (relative to MRwMR), and it also outperforms three popular classifier dependent feature selection methods.
Cite
Text
Liu and Motani. "Improving Mutual Information Based Feature Selection by Boosting Unique Relevance." Journal of Artificial Intelligence Research, 2025. doi:10.1613/JAIR.1.17219Markdown
[Liu and Motani. "Improving Mutual Information Based Feature Selection by Boosting Unique Relevance." Journal of Artificial Intelligence Research, 2025.](https://mlanthology.org/jair/2025/liu2025jair-improving/) doi:10.1613/JAIR.1.17219BibTeX
@article{liu2025jair-improving,
title = {{Improving Mutual Information Based Feature Selection by Boosting Unique Relevance}},
author = {Liu, Shiyu and Motani, Mehul},
journal = {Journal of Artificial Intelligence Research},
year = {2025},
pages = {1267-1292},
doi = {10.1613/JAIR.1.17219},
volume = {82},
url = {https://mlanthology.org/jair/2025/liu2025jair-improving/}
}