Clustering by Maximizing Mutual Information Across Views
Abstract
We propose a novel framework for image clustering that incorporates joint representation learning and clustering. Our method consists of two heads that share the same backbone network - a "representation learning" head and a "clustering" head. The "representation learning" head captures fine-grained patterns of objects at the instance level which serve as clues for the "clustering" head to extract coarse-grain information that separates objects into clusters. The whole model is trained in an end-to-end manner by minimizing the weighted sum of two sample-oriented contrastive losses applied to the outputs of the two heads. To ensure that the contrastive loss corresponding to the "clustering" head is optimal, we introduce a novel critic function called "log-of-dot-product". Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets, improving over the best baseline by about 5-7% in accuracy on CIFAR10/20, STL10, and ImageNet-Dogs. Further, the "two-stage" variant of our method also achieves better results than baselines on three challenging ImageNet subsets.
Cite
Text
Do et al. "Clustering by Maximizing Mutual Information Across Views." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00978Markdown
[Do et al. "Clustering by Maximizing Mutual Information Across Views." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/do2021iccv-clustering/) doi:10.1109/ICCV48922.2021.00978BibTeX
@inproceedings{do2021iccv-clustering,
title = {{Clustering by Maximizing Mutual Information Across Views}},
author = {Do, Kien and Tran, Truyen and Venkatesh, Svetha},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {9928-9938},
doi = {10.1109/ICCV48922.2021.00978},
url = {https://mlanthology.org/iccv/2021/do2021iccv-clustering/}
}