Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Leopold, Edda; Kindermann, Jörg

doi:10.1023/A:1012491419635

Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Edda Leopold, Jörg Kindermann

MLJ 2002 pp. 423-444

doi:10.1023/A:1012491419635 /mlj/2002/leopold2002mlj-text/

Abstract

The choice of the kernel function is crucial to most applications of support vector machines. In this paper, however, we show that in the case of text classification, term-frequency transformations have a larger impact on the performance of SVM than the kernel itself. We discuss the role of importance-weights (e.g. document frequency and redundancy), which is not yet fully understood in the light of model complexity and calculation cost, and we show that time consuming lemmatization or stemming can be avoided even when classifying a highly inflectional language like German.

PDF MLJ Semantic Scholar

Cite

Text

Leopold and Kindermann. "Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?." Machine Learning, 2002. doi:10.1023/A:1012491419635

Markdown

[Leopold and Kindermann. "Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?." Machine Learning, 2002.](https://mlanthology.org/mlj/2002/leopold2002mlj-text/) doi:10.1023/A:1012491419635

BibTeX

@article{leopold2002mlj-text,
  title     = {{Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?}},
  author    = {Leopold, Edda and Kindermann, Jörg},
  journal   = {Machine Learning},
  year      = {2002},
  pages     = {423-444},
  doi       = {10.1023/A:1012491419635},
  volume    = {46},
  url       = {https://mlanthology.org/mlj/2002/leopold2002mlj-text/}
}