Differentiable Feature Selection by Discrete Relaxation
Abstract
In this paper, we introduce Differentiable Feature Selection, a gradient-based search algorithm for feature selection. Our approach extends a recent result on the estimation of learnability in the sublinear data regime by showing that the calculation can be performed iteratively (i.e. in mini-batches) and in linear time and space with respect to both the number of features D and the sample size N. This, along with a discrete-to-continuous relaxation of the search domain, allows for an efficient, gradient-based search algorithm among feature subsets for very large datasets. Our algorithm utilizes higher-order correlations between features and targets for both the N>D and N<D regimes, as opposed to approaches that do not consider such interactions and/or only consider one regime. We provide experimental demonstration of the algorithm in small and large sample- and feature-size settings.
Cite
Text
Sheth and Fusi. "Differentiable Feature Selection by Discrete Relaxation." Artificial Intelligence and Statistics, 2020.Markdown
[Sheth and Fusi. "Differentiable Feature Selection by Discrete Relaxation." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/sheth2020aistats-differentiable/)BibTeX
@inproceedings{sheth2020aistats-differentiable,
title = {{Differentiable Feature Selection by Discrete Relaxation}},
author = {Sheth, Rishit and Fusi, Nicoló},
booktitle = {Artificial Intelligence and Statistics},
year = {2020},
pages = {1564-1572},
volume = {108},
url = {https://mlanthology.org/aistats/2020/sheth2020aistats-differentiable/}
}