Distributional Features for Text Categorization

Xue, Xiao-Bing; Zhou, Zhi-Hua

doi:10.1007/11871842_47

Distributional Features for Text Categorization

Xiao-Bing Xue, Zhi-Hua Zhou

ECML-PKDD 2006 pp. 497-508

doi:10.1007/11871842_47 /ecmlpkdd/2006/xue2006ecml-distributional/

Abstract

In previous research of text categorization, a word is usually described by features which express that whether the word appears in the document or how frequently the word appears. Although these features are useful, they have not fully expressed the information contained in the document. In this paper, the distributional features are used to describe a word, which express the distribution of a word in a document. In detail, the compactness of the appearances of the word and the position of the first appearance of the word are characterized as features. These features are exploited by a TFIDF style equation in this paper. Experiments show that the distributional features are useful for text categorization. In contrast to using the traditional term frequency features solely, including the distributional features requires only a little additional cost, while the categorization performance can be significantly improved.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Xue and Zhou. "Distributional Features for Text Categorization." European Conference on Machine Learning, 2006. doi:10.1007/11871842_47

Markdown

[Xue and Zhou. "Distributional Features for Text Categorization." European Conference on Machine Learning, 2006.](https://mlanthology.org/ecmlpkdd/2006/xue2006ecml-distributional/) doi:10.1007/11871842_47

BibTeX

@inproceedings{xue2006ecml-distributional,
  title     = {{Distributional Features for Text Categorization}},
  author    = {Xue, Xiao-Bing and Zhou, Zhi-Hua},
  booktitle = {European Conference on Machine Learning},
  year      = {2006},
  pages     = {497-508},
  doi       = {10.1007/11871842_47},
  url       = {https://mlanthology.org/ecmlpkdd/2006/xue2006ecml-distributional/}
}