Second Order Features for Maximising Text Classification Performance

Abstract

The paper demonstrates that the addition of automatically selected word-pairs substantially increases the accuracy of text classification which is contrary to most previously reported research. The word-pairs are selected automatically using a technique based on frequencies of n -grams (sequences of characters), which takes into account both the frequencies of word-pairs as well as the context in which they occur. These improvements are reported for two different classifiers, support vector machines ( SVM ) and k -nearest neighbours ( kNN ), and two different text corpora. For the first of them, a collection of articles from PC Week magazine, the addition of word-pairs increases micro-averaged breakeven accuracy by more than 6% point from a baseline accuracy (without pairs) of around 40%. For second one, the standard Reuters benchmark, SVM classifier using augmentation with pairs outperforms all previously reported results.

Cite

Text

Raskutti et al. "Second Order Features for Maximising Text Classification Performance." European Conference on Machine Learning, 2001. doi:10.1007/3-540-44795-4_36

Markdown

[Raskutti et al. "Second Order Features for Maximising Text Classification Performance." European Conference on Machine Learning, 2001.](https://mlanthology.org/ecmlpkdd/2001/raskutti2001ecml-second/) doi:10.1007/3-540-44795-4_36

BibTeX

@inproceedings{raskutti2001ecml-second,
  title     = {{Second Order Features for Maximising Text Classification Performance}},
  author    = {Raskutti, Bhavani and Ferrá, Herman L. and Kowalczyk, Adam},
  booktitle = {European Conference on Machine Learning},
  year      = {2001},
  pages     = {419-430},
  doi       = {10.1007/3-540-44795-4_36},
  url       = {https://mlanthology.org/ecmlpkdd/2001/raskutti2001ecml-second/}
}