Unsupervised Clustering Using Pseudo-Semi-Supervised Learning

Gupta, Divam; Ramjee, Ramachandran; Kwatra, Nipun; Sivathanu, Muthian

Unsupervised Clustering Using Pseudo-Semi-Supervised Learning

Divam Gupta, Ramachandran Ramjee, Nipun Kwatra, Muthian Sivathanu

ICLR 2020

/iclr/2020/gupta2020iclr-unsupervised/

Abstract

In this paper, we propose a framework that leverages semi-supervised models to improve unsupervised clustering performance. To leverage semi-supervised models, we first need to automatically generate labels, called pseudo-labels. We find that prior approaches for generating pseudo-labels hurt clustering performance because of their low accuracy. Instead, we use an ensemble of deep networks to construct a similarity graph, from which we extract high accuracy pseudo-labels. The approach of finding high quality pseudo-labels using ensembles and training the semi-supervised model is iterated, yielding continued improvement. We show that our approach outperforms state of the art clustering results for multiple image and text datasets. For example, we achieve 54.6% accuracy for CIFAR-10 and 43.9% for 20news, outperforming state of the art by 8-12% in absolute terms.

PDF ICLR Semantic Scholar

Cite

Text

Gupta et al. "Unsupervised Clustering Using Pseudo-Semi-Supervised Learning." International Conference on Learning Representations, 2020.

Markdown

[Gupta et al. "Unsupervised Clustering Using Pseudo-Semi-Supervised Learning." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/gupta2020iclr-unsupervised/)

BibTeX

@inproceedings{gupta2020iclr-unsupervised,
  title     = {{Unsupervised Clustering Using Pseudo-Semi-Supervised Learning}},
  author    = {Gupta, Divam and Ramjee, Ramachandran and Kwatra, Nipun and Sivathanu, Muthian},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/gupta2020iclr-unsupervised/}
}