Measuring the Stability of Feature Selection
Abstract
In feature selection algorithms, “stability” is the sensitivity of the chosen feature set to variations in the supplied training data. As such it can be seen as an analogous concept to the statistical variance of a predictor. However unlike variance, there is no unique definition of stability, with numerous proposed measures over 15 years of literature. In this paper, instead of defining a new measure, we start from an axiomatic point of view and identify what properties would be desirable. Somewhat surprisingly, we find that the simple Pearson’s correlation coefficient has all necessary properties, yet has somehow been overlooked in favour of more complex alternatives. Finally, we illustrate how the use of this measure in practice can provide better interpretability and more confidence in the model selection process. The data and software related to this paper are available at https://github.com/nogueirs/ECML2016 .
Cite
Text
Nogueira and Brown. "Measuring the Stability of Feature Selection." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016. doi:10.1007/978-3-319-46227-1_28Markdown
[Nogueira and Brown. "Measuring the Stability of Feature Selection." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2016.](https://mlanthology.org/ecmlpkdd/2016/nogueira2016ecmlpkdd-measuring/) doi:10.1007/978-3-319-46227-1_28BibTeX
@inproceedings{nogueira2016ecmlpkdd-measuring,
title = {{Measuring the Stability of Feature Selection}},
author = {Nogueira, Sarah and Brown, Gavin},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2016},
pages = {442-457},
doi = {10.1007/978-3-319-46227-1_28},
url = {https://mlanthology.org/ecmlpkdd/2016/nogueira2016ecmlpkdd-measuring/}
}