Second Order Features for Maximising Text Classification Performance

Raskutti, Bhavani; Ferrá, Herman L.; Kowalczyk, Adam

doi:10.1007/3-540-44795-4_36

Second Order Features for Maximising Text Classification Performance

Bhavani Raskutti, Herman L. Ferrá, Adam Kowalczyk

ECML-PKDD 2001 pp. 419-430

doi:10.1007/3-540-44795-4_36 /ecmlpkdd/2001/raskutti2001ecml-second/

Abstract

The paper demonstrates that the addition of automatically selected word-pairs substantially increases the accuracy of text classification which is contrary to most previously reported research. The word-pairs are selected automatically using a technique based on frequencies of n -grams (sequences of characters), which takes into account both the frequencies of word-pairs as well as the context in which they occur. These improvements are reported for two different classifiers, support vector machines ( SVM ) and k -nearest neighbours ( kNN ), and two different text corpora. For the first of them, a collection of articles from PC Week magazine, the addition of word-pairs increases micro-averaged breakeven accuracy by more than 6% point from a baseline accuracy (without pairs) of around 40%. For second one, the standard Reuters benchmark, SVM classifier using augmentation with pairs outperforms all previously reported results.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Raskutti et al. "Second Order Features for Maximising Text Classification Performance." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_36

Markdown

[Raskutti et al. "Second Order Features for Maximising Text Classification Performance." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/raskutti2001ecml-second/) doi:10.1007/3-540-44795-4_36

BibTeX

@inproceedings{raskutti2001ecml-second,
  title     = {{Second Order Features for Maximising Text Classification Performance}},
  author    = {Raskutti, Bhavani and Ferrá, Herman L. and Kowalczyk, Adam},
  booktitle = {European Conference on Machine Learning},
  year      = {2001},
  pages     = {419-430},
  doi       = {10.1007/3-540-44795-4_36},
  url       = {https://mlanthology.org/ecmlpkdd/2001/raskutti2001ecml-second/}
}