Semi-Supervised Learning for Blog Classification
Abstract
Blog classification (e.g., identifying bloggers' gender or age) is one of the most interesting current problems in blog analysis. Although this problem is usually solved by applying supervised learning techniques, the large labeled dataset required for training is not always available. In contrast, unlabeled blogs can easily be collected from the web. Therefore, a semi-supervised learning method for blog classification, effectively using unlabeled data, is proposed. In this method, entries from the same blog are assumed to have the same characteristics. With this assumption, the proposed method captures the characteristics of each blog, such as writing style and topic, and uses these characteristics to improve the classification accuracy.
Cite
Text
Ikeda et al. "Semi-Supervised Learning for Blog Classification." AAAI Conference on Artificial Intelligence, 2008.Markdown
[Ikeda et al. "Semi-Supervised Learning for Blog Classification." AAAI Conference on Artificial Intelligence, 2008.](https://mlanthology.org/aaai/2008/ikeda2008aaai-semi/)BibTeX
@inproceedings{ikeda2008aaai-semi,
title = {{Semi-Supervised Learning for Blog Classification}},
author = {Ikeda, Daisuke and Takamura, Hiroya and Okumura, Manabu},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2008},
pages = {1156-1161},
url = {https://mlanthology.org/aaai/2008/ikeda2008aaai-semi/}
}