Unsupervised Clustering Using Pseudo-Semi-Supervised Learning
Abstract
In this paper, we propose a framework that leverages semi-supervised models to improve unsupervised clustering performance. To leverage semi-supervised models, we first need to automatically generate labels, called pseudo-labels. We find that prior approaches for generating pseudo-labels hurt clustering performance because of their low accuracy. Instead, we use an ensemble of deep networks to construct a similarity graph, from which we extract high accuracy pseudo-labels. The approach of finding high quality pseudo-labels using ensembles and training the semi-supervised model is iterated, yielding continued improvement. We show that our approach outperforms state of the art clustering results for multiple image and text datasets. For example, we achieve 54.6% accuracy for CIFAR-10 and 43.9% for 20news, outperforming state of the art by 8-12% in absolute terms.
Cite
Text
Gupta et al. "Unsupervised Clustering Using Pseudo-Semi-Supervised Learning." International Conference on Learning Representations, 2020.Markdown
[Gupta et al. "Unsupervised Clustering Using Pseudo-Semi-Supervised Learning." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/gupta2020iclr-unsupervised/)BibTeX
@inproceedings{gupta2020iclr-unsupervised,
title = {{Unsupervised Clustering Using Pseudo-Semi-Supervised Learning}},
author = {Gupta, Divam and Ramjee, Ramachandran and Kwatra, Nipun and Sivathanu, Muthian},
booktitle = {International Conference on Learning Representations},
year = {2020},
url = {https://mlanthology.org/iclr/2020/gupta2020iclr-unsupervised/}
}