Combining Labeled and Unlabeled Data for MultiClass Text Categorization

Abstract

Supervised learning techniques for text classification often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled data required is to develop algorithms that can learn effectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such as EM and Co-Training, are mostly applicable for classification tasks with a small number of classes and do not scale up well for large multiclass problems. In this paper, wedevelop a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by first decomposing multiclass problems into multiple binary problems and then using Co-Training to learn the individual binary classification problems.

Cite

Text

Ghani. "Combining Labeled and Unlabeled Data for MultiClass Text Categorization." International Conference on Machine Learning, 2002.

Markdown

[Ghani. "Combining Labeled and Unlabeled Data for MultiClass Text Categorization." International Conference on Machine Learning, 2002.](https://mlanthology.org/icml/2002/ghani2002icml-combining/)

BibTeX

@inproceedings{ghani2002icml-combining,
  title     = {{Combining Labeled and Unlabeled Data for MultiClass Text Categorization}},
  author    = {Ghani, Rayid},
  booktitle = {International Conference on Machine Learning},
  year      = {2002},
  pages     = {187-194},
  url       = {https://mlanthology.org/icml/2002/ghani2002icml-combining/}
}