Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes

Abstract

Numerous semi-supervised learning methods have been proposed to augment Multinomial Naive Bayes (MNB) using unlabeled documents, but their use in practice is often limited due to implementation difficulty, inconsistent prediction performance, or high computational cost. In this paper, we propose a new, very simple semi-supervised extension of MNB, called Semi-supervised Frequency Estimate (SFE). Our experiments show that it consistently improves MNB with additional data (labeled or unlabeled) in terms of AUC and accuracy, which is not the case when combining MNB with Expectation Maximization (EM). We attribute this to the fact that SFE consistently produces better conditional log likelihood values than both EM+MNB and MNB in labeled training data.

Cite

Text

Su et al. "Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes." International Conference on Machine Learning, 2011.

Markdown

[Su et al. "Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/su2011icml-large/)

BibTeX

@inproceedings{su2011icml-large,
  title     = {{Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes}},
  author    = {Su, Jiang and Shirab, Jelber Sayyad and Matwin, Stan},
  booktitle = {International Conference on Machine Learning},
  year      = {2011},
  pages     = {97-104},
  url       = {https://mlanthology.org/icml/2011/su2011icml-large/}
}