Feature Selection in SVM Text Categorization

Taira, Hirotoshi; Haruno, Masahiko

Feature Selection in SVM Text Categorization

AAAI 1999 pp. 480-486

/aaai/1999/taira1999aaai-feature/

Abstract

This paper investigates the effect of prior feature selection in Support Vector Machine (SVM) text categorization. The input space was gradually increased by using mutual information (MI) fil-tering and part-of-speech (POS) filtering, which determine the portion of words that are appro-priate for learning from the information-theoretic and the linguistic perspectives, respectively. We tested the two filtering methods on SVMs as well as a decision tree algorithm C4.5. The SVMs ’ re-sults common to both filtering are that 1) the opti-mal number of features differed completely across categories, and 2) the average performance for all categories was best when all of the words were used. In addition, a comparison of the two filter-ing methods clarified that POS filtering on SVMs consistently outperformed MI filtering, which in-dicates that SVMs cannot find irrelevant parts of speech. These results suggest a simple strategy for the SVM text categorization: use a full number of words found through a rough filtering technique like part-of-speech tagging.

PDF AAAI Semantic Scholar

Cite

Text

Taira and Haruno. "Feature Selection in SVM Text Categorization." AAAI Conference on Artificial Intelligence, 1999.

Markdown

[Taira and Haruno. "Feature Selection in SVM Text Categorization." AAAI Conference on Artificial Intelligence, 1999.](https://mlanthology.org/aaai/1999/taira1999aaai-feature/)

BibTeX

@inproceedings{taira1999aaai-feature,
  title     = {{Feature Selection in SVM Text Categorization}},
  author    = {Taira, Hirotoshi and Haruno, Masahiko},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1999},
  pages     = {480-486},
  url       = {https://mlanthology.org/aaai/1999/taira1999aaai-feature/}
}