Samples Are Not Equal: A Sample Selection Approach for Deep Clustering

Abstract

Deep clustering has recently achieved remarkable progress across various domains. However, existing clustering methods typically treat all samples equally, neglecting the inherent differences in their feature patterns and learning states. Such redundant learning often drives models to overemphasize simple feature patterns in high-density regions, weakening their ability to capture complex yet diverse ones in low-density regions. To address this issue, we propose a novel plug-in designed to mitigate overfitting to simple and redundant feature patterns while encouraging the learning of more complex yet diverse ones. Specifically, we introduce a density-aware clustering head initialization strategy that adaptively adjusts each sample's contribution to cluster prototypes according to its local density in the feature space. This strategy mitigates the bias towards high-density regions and encourages a more comprehensive attention on medium- and low-density ones. Furthermore, we design a dynamic sample selection strategy that evaluates the learning state of samples based on the feature consistency and pseudo-label stability. By removing sufficiently learned samples and prioritizing unstable ones, this strategy adaptively reallocates training resources, enabling the model to consistently focus on samples that remain under-learned throughout training. Our method can be integrated as a plug-in into a wide range of deep clustering architectures. Extensive experiments on multiple benchmark datasets demonstrate that our method improves clustering accuracy by up to $\textbf{6.1}$\% and enhances training efficiency by up to $\textbf{1.3$\times$}$. Code is available at [https://github.com/notoaudrey/Samples-Are-Not-Equal](https://github.com/notoaudrey/Samples-Are-Not-Equal).

Cite

Text

Jiao et al. "Samples Are Not Equal: A Sample Selection Approach for Deep Clustering." International Conference on Learning Representations, 2026.

Markdown

[Jiao et al. "Samples Are Not Equal: A Sample Selection Approach for Deep Clustering." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/jiao2026iclr-samples/)

BibTeX

@inproceedings{jiao2026iclr-samples,
  title     = {{Samples Are Not Equal: A Sample Selection Approach for Deep Clustering}},
  author    = {Jiao, Zhengxing and Hou, Yaxin and Ma, Jun and Li, Yuhang and Ding, Ding and Jia, Yuheng and Liu, Hui and Hou, Junhui},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/jiao2026iclr-samples/}
}