A Theoretical Characterization of Linear SVM-Based Feature Selection
Abstract
Most prevalent techniques in Support Vector Machine (SVM) feature selectionare based on the intuition that the weights of features that are close to zeroare not required for optimal classification. In this paper we show thatindeed, in the sample limit, the irrelevant variables (in a theoretical andoptimal sense) will be given zero weight by a linear SVM, both in the soft andthe hard margin case. However, SVM-based methods have certain theoreticaldisadvantages too. We present examples where the linear SVM may assign zeroweights to strongly relevant variables (i.e., variables required for optimalestimation of the distribution of the target variable) and where weaklyrelevant features (i.e., features that are superfluous for optimal featureselection given other features) may get non-zero weights. We contrast andtheoretically compare with Markov-Blanket based feature selection algorithmsthat do not have such disadvantages in a broad class of distributions andcould also be used for causal discovery.
Cite
Text
Hardin et al. "A Theoretical Characterization of Linear SVM-Based Feature Selection." International Conference on Machine Learning, 2004. doi:10.1145/1015330.1015421Markdown
[Hardin et al. "A Theoretical Characterization of Linear SVM-Based Feature Selection." International Conference on Machine Learning, 2004.](https://mlanthology.org/icml/2004/hardin2004icml-theoretical/) doi:10.1145/1015330.1015421BibTeX
@inproceedings{hardin2004icml-theoretical,
title = {{A Theoretical Characterization of Linear SVM-Based Feature Selection}},
author = {Hardin, Douglas P. and Tsamardinos, Ioannis and Aliferis, Constantin F.},
booktitle = {International Conference on Machine Learning},
year = {2004},
doi = {10.1145/1015330.1015421},
url = {https://mlanthology.org/icml/2004/hardin2004icml-theoretical/}
}