Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme
Abstract
Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, in many text classification tasks, labeled training documents are expensive to obtain, while unlabeled documents are readily available in large quantities. This paper describes a general framework for extending any text learning algorithm to utilize unlabeled documents in addition to labeled document using an Expectation-Maximization-like scheme. Our instantiation of this partially supervised classification framework with a similarity-based single prototype classifier achieves encouraging results on two real-world text datasets. Classification accuracy is reduced by up to 38% when using unlabeled documents in addition to labeled documents.
Cite
Text
Lanquillon. "Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme." European Conference on Machine Learning, 2000. doi:10.1007/3-540-45164-1_24Markdown
[Lanquillon. "Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme." European Conference on Machine Learning, 2000.](https://mlanthology.org/ecmlpkdd/2000/lanquillon2000ecml-partially/) doi:10.1007/3-540-45164-1_24BibTeX
@inproceedings{lanquillon2000ecml-partially,
title = {{Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme}},
author = {Lanquillon, Carsten},
booktitle = {European Conference on Machine Learning},
year = {2000},
pages = {229-237},
doi = {10.1007/3-540-45164-1_24},
url = {https://mlanthology.org/ecmlpkdd/2000/lanquillon2000ecml-partially/}
}