Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination

Abstract

Predictive models benefit from a compact, non-redundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection methods. We describe details of an algorithm using tree-based ensembles to generate a compact subset of non-redundant features. Parallel and serial ensembles of trees are combined into a mixed method that can uncover masking and detect features of secondary effect. Simulated and actual examples illustrate the effectiveness of the approach.

Cite

Text

Tuv et al. "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination." Journal of Machine Learning Research, 2009.

Markdown

[Tuv et al. "Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination." Journal of Machine Learning Research, 2009.](https://mlanthology.org/jmlr/2009/tuv2009jmlr-feature/)

BibTeX

@article{tuv2009jmlr-feature,
  title     = {{Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination}},
  author    = {Tuv, Eugene and Borisov, Alexander and Runger, George and Torkkola, Kari},
  journal   = {Journal of Machine Learning Research},
  year      = {2009},
  pages     = {1341-1366},
  volume    = {10},
  url       = {https://mlanthology.org/jmlr/2009/tuv2009jmlr-feature/}
}