Differentiable Feature Selection by Discrete Relaxation

Abstract

In this paper, we introduce Differentiable Feature Selection, a gradient-based search algorithm for feature selection. Our approach extends a recent result on the estimation of learnability in the sublinear data regime by showing that the calculation can be performed iteratively (i.e. in mini-batches) and in linear time and space with respect to both the number of features D and the sample size N. This, along with a discrete-to-continuous relaxation of the search domain, allows for an efficient, gradient-based search algorithm among feature subsets for very large datasets. Our algorithm utilizes higher-order correlations between features and targets for both the N>D and N<D regimes, as opposed to approaches that do not consider such interactions and/or only consider one regime. We provide experimental demonstration of the algorithm in small and large sample- and feature-size settings.

Cite

Text

Sheth and Fusi. "Differentiable Feature Selection by Discrete Relaxation." Artificial Intelligence and Statistics, 2020.

Markdown

[Sheth and Fusi. "Differentiable Feature Selection by Discrete Relaxation." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/sheth2020aistats-differentiable/)

BibTeX

@inproceedings{sheth2020aistats-differentiable,
  title     = {{Differentiable Feature Selection by Discrete Relaxation}},
  author    = {Sheth, Rishit and Fusi, Nicoló},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {1564-1572},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/sheth2020aistats-differentiable/}
}