Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes

Su, Jiang; Shirab, Jelber Sayyad; Matwin, Stan

Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes

Jiang Su, Jelber Sayyad Shirab, Stan Matwin

ICML 2011 pp. 97-104

/icml/2011/su2011icml-large/

Abstract

Numerous semi-supervised learning methods have been proposed to augment Multinomial Naive Bayes (MNB) using unlabeled documents, but their use in practice is often limited due to implementation difficulty, inconsistent prediction performance, or high computational cost. In this paper, we propose a new, very simple semi-supervised extension of MNB, called Semi-supervised Frequency Estimate (SFE). Our experiments show that it consistently improves MNB with additional data (labeled or unlabeled) in terms of AUC and accuracy, which is not the case when combining MNB with Expectation Maximization (EM). We attribute this to the fact that SFE consistently produces better conditional log likelihood values than both EM+MNB and MNB in labeled training data.

PDF Semantic Scholar

Cite

Text

Su et al. "Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes." International Conference on Machine Learning, 2011.

Markdown

[Su et al. "Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes." International Conference on Machine Learning, 2011.](https://mlanthology.org/icml/2011/su2011icml-large/)

BibTeX

@inproceedings{su2011icml-large,
  title     = {{Large Scale Text Classification Using Semisupervised Multinomial Naive Bayes}},
  author    = {Su, Jiang and Shirab, Jelber Sayyad and Matwin, Stan},
  booktitle = {International Conference on Machine Learning},
  year      = {2011},
  pages     = {97-104},
  url       = {https://mlanthology.org/icml/2011/su2011icml-large/}
}