Feature Selection Using E-Values

Subhabrata Majumdar, Snigdhansu Chatterjee

ICML 2022 pp. 14753-14773

/icml/2022/majumdar2022icml-feature/

Abstract

In the context of supervised learning, we introduce the concept of e-value. An e-value is a scalar quantity that represents the proximity of the sampling distribution of parameter estimates in a model trained on a subset of features to that of the model trained on all features (i.e. the full model). Under general conditions, a rank ordering of e-values separates models that contain all essential features from those that do not. For a p-dimensional feature space, this requires fitting only the full model and evaluating p+1 models, as opposed to the traditional requirement of fitting and evaluating 2^p models. The above e-values framework is applicable to a wide range of parametric models. We use data depths and a fast resampling-based algorithm to implement a feature selection procedure, providing consistency results. Through experiments across several model settings and synthetic and real datasets, we establish that the e-values can be a promising general alternative to existing model-specific methods of feature selection.

PDF ICML Semantic Scholar

Cite

Text

Majumdar and Chatterjee. "Feature Selection Using E-Values." International Conference on Machine Learning, 2022.

Markdown

[Majumdar and Chatterjee. "Feature Selection Using E-Values." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/majumdar2022icml-feature/)

BibTeX

@inproceedings{majumdar2022icml-feature,
  title     = {{Feature Selection Using E-Values}},
  author    = {Majumdar, Subhabrata and Chatterjee, Snigdhansu},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {14753-14773},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/majumdar2022icml-feature/}
}