Bag-of-Embeddings for Text Classification

Abstract

Words are central to text classification. It has been shown that simple Naive Bayes models with word and bigram features can give highly competitive accuracies when compared to more sophisticated models with part-of-speech, syntax and semantic features. Embeddings offer distributional features about words. We study a conceptually simple classification model by exploiting multi-prototype word embeddings based on text classes. The key assumption is that words exhibit different distributional characteristics under different text classes. Based on this assumption, we train multi-prototype distributional word representations for different text classes. Given a new document, its text class is predicted by maximizing the probabilities of embedding vectors of its words under the class. In two standard classification benchmark datasets, one is balance and the other is imbalance, our model outperforms state-of-the-art systems, on both accuracy and macro-average F-1 score. PDF

Cite

Text

Jin et al. "Bag-of-Embeddings for Text Classification." International Joint Conference on Artificial Intelligence, 2016.

Markdown

[Jin et al. "Bag-of-Embeddings for Text Classification." International Joint Conference on Artificial Intelligence, 2016.](https://mlanthology.org/ijcai/2016/jin2016ijcai-bag/)

BibTeX

@inproceedings{jin2016ijcai-bag,
  title     = {{Bag-of-Embeddings for Text Classification}},
  author    = {Jin, Peng and Zhang, Yue and Chen, Xingyuan and Xia, Yunqing},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {2824-2830},
  url       = {https://mlanthology.org/ijcai/2016/jin2016ijcai-bag/}
}