Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme

Lanquillon, Carsten

doi:10.1007/3-540-45164-1_24

Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme

Carsten Lanquillon

ECML-PKDD 2000 pp. 229-237

doi:10.1007/3-540-45164-1_24 /ecmlpkdd/2000/lanquillon2000ecml-partially/

Abstract

Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, in many text classification tasks, labeled training documents are expensive to obtain, while unlabeled documents are readily available in large quantities. This paper describes a general framework for extending any text learning algorithm to utilize unlabeled documents in addition to labeled document using an Expectation-Maximization-like scheme. Our instantiation of this partially supervised classification framework with a similarity-based single prototype classifier achieves encouraging results on two real-world text datasets. Classification accuracy is reduced by up to 38% when using unlabeled documents in addition to labeled documents.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Lanquillon. "Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme." European Conference on Machine Learning, 2000. doi:10.1007/3-540-45164-1_24

Markdown

[Lanquillon. "Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme." European Conference on Machine Learning, 2000.](https://mlanthology.org/ecmlpkdd/2000/lanquillon2000ecml-partially/) doi:10.1007/3-540-45164-1_24

BibTeX

@inproceedings{lanquillon2000ecml-partially,
  title     = {{Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme}},
  author    = {Lanquillon, Carsten},
  booktitle = {European Conference on Machine Learning},
  year      = {2000},
  pages     = {229-237},
  doi       = {10.1007/3-540-45164-1_24},
  url       = {https://mlanthology.org/ecmlpkdd/2000/lanquillon2000ecml-partially/}
}