An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?
Abstract
Recent research in clustering face embeddings has found that unsupervised, shallow, heuristic-based methods---including $k$-means and hierarchical agglomerative clustering---underperform supervised, deep, inductive methods. While the reported improvements are indeed impressive, experiments are mostly limited to face datasets, where the clustered embeddings are highly discriminative or well-separated by class (Recall@1 above 90% and often near ceiling), and the experimental methodology seemingly favors the deep methods. We conduct an empirical study of 14 clustering methods on two popular non-face datasets---Cars196 and Stanford Online Products---and obtain robust, but contentious findings. Notably, deep methods are surprisingly fragile for embeddings with more uncertainty, where they underperform the shallow, heuristic-based methods. We believe our benchmarks broaden the scope of supervised clustering methods beyond the face domain and can serve as a foundation on which these methods could be improved.
Cite
Text
Scott et al. "An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?." NeurIPS 2022 Workshops: ICBINB, 2022.Markdown
[Scott et al. "An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?." NeurIPS 2022 Workshops: ICBINB, 2022.](https://mlanthology.org/neuripsw/2022/scott2022neuripsw-empirical/)BibTeX
@inproceedings{scott2022neuripsw-empirical,
title = {{An Empirical Study on Clustering Pretrained Embeddings: Is Deep Strictly Better?}},
author = {Scott, Tyler R. and Liu, Ting and Mozer, Michael Curtis and Gallagher, Andrew},
booktitle = {NeurIPS 2022 Workshops: ICBINB},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/scott2022neuripsw-empirical/}
}