Efficient Feature Selection Using Shrinkage Estimators

Abstract

Information theoretic feature selection methods quantify the importance of each feature by estimating mutual information terms to capture: the relevancy, the redundancy and the complementarity. These terms are commonly estimated by maximum likelihood, while an under-explored area of research is how to use shrinkage methods instead. Our work suggests a novel shrinkage method for data-efficient estimation of information theoretic terms. The small sample behaviour makes it particularly suitable for estimation of discrete distributions with large number of categories (bins). Using our novel estimators we derive a framework for generating feature selection criteria that capture any high-order feature interaction for redundancy and complementarity. We perform a thorough empirical study across datasets from diverse sources and using various evaluation measures. Our first finding is that our shrinkage based methods achieve better results, while they keep the same computational cost as the simple maximum likelihood based methods. Furthermore, under our framework we derive efficient novel high-order criteria that outperform state-of-the-art methods in various tasks.

Cite

Text

Sechidis et al. "Efficient Feature Selection Using Shrinkage Estimators." Machine Learning, 2019. doi:10.1007/S10994-019-05795-1

Markdown

[Sechidis et al. "Efficient Feature Selection Using Shrinkage Estimators." Machine Learning, 2019.](https://mlanthology.org/mlj/2019/sechidis2019mlj-efficient/) doi:10.1007/S10994-019-05795-1

BibTeX

@article{sechidis2019mlj-efficient,
  title     = {{Efficient Feature Selection Using Shrinkage Estimators}},
  author    = {Sechidis, Konstantinos and Azzimonti, Laura and Pocock, Adam Craig and Corani, Giorgio and Weatherall, James and Brown, Gavin},
  journal   = {Machine Learning},
  year      = {2019},
  pages     = {1261-1286},
  doi       = {10.1007/S10994-019-05795-1},
  volume    = {108},
  url       = {https://mlanthology.org/mlj/2019/sechidis2019mlj-efficient/}
}