Constrained Coclustering for Textual Documents

Abstract

In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and co-clustering approaches, our approach is shown to be more effective for high-dimensional, sparse text data.

Cite

Text

Song et al. "Constrained Coclustering for Textual Documents." AAAI Conference on Artificial Intelligence, 2010. doi:10.1609/AAAI.V24I1.7680

Markdown

[Song et al. "Constrained Coclustering for Textual Documents." AAAI Conference on Artificial Intelligence, 2010.](https://mlanthology.org/aaai/2010/song2010aaai-constrained/) doi:10.1609/AAAI.V24I1.7680

BibTeX

@inproceedings{song2010aaai-constrained,
  title     = {{Constrained Coclustering for Textual Documents}},
  author    = {Song, Yangqiu and Pan, Shimei and Liu, Shixia and Wei, Furu and Zhou, Michelle X. and Qian, Weihong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2010},
  pages     = {581-586},
  doi       = {10.1609/AAAI.V24I1.7680},
  url       = {https://mlanthology.org/aaai/2010/song2010aaai-constrained/}
}