Self-Taught Clustering

Abstract

This paper focuses on a new clustering task, called self-taught clustering . Self-taught clustering is an instance of unsupervised transfer learning , which aims at clustering a small collection of target unlabeled data with the help of a large amount of auxiliary unlabeled data. The target and auxiliary data can be different in topic distribution. We show that even when the target data are not sufficient to allow effective learning of a high quality feature representation, it is possible to learn the useful features with the help of the auxiliary data on which the target data can be clustered effectively. We propose a co-clustering based self-taught clustering algorithm to tackle this problem, by clustering the target and auxiliary data simultaneously to allow the feature representation from the auxiliary data to influence the target data through a common set of features. Under the new data representation, clustering on the target data can be improved. Our experiments on image clustering show that our algorithm can greatly outperform several state-of-the-art clustering methods when utilizing irrelevant unlabeled auxiliary data.

Cite

Text

Dai et al. "Self-Taught Clustering." International Conference on Machine Learning, 2008. doi:10.1145/1390156.1390182

Markdown

[Dai et al. "Self-Taught Clustering." International Conference on Machine Learning, 2008.](https://mlanthology.org/icml/2008/dai2008icml-self/) doi:10.1145/1390156.1390182

BibTeX

@inproceedings{dai2008icml-self,
  title     = {{Self-Taught Clustering}},
  author    = {Dai, Wenyuan and Yang, Qiang and Xue, Gui-Rong and Yu, Yong},
  booktitle = {International Conference on Machine Learning},
  year      = {2008},
  pages     = {200-207},
  doi       = {10.1145/1390156.1390182},
  url       = {https://mlanthology.org/icml/2008/dai2008icml-self/}
}