Rethinking Coreset Selection: The Surprising Effectiveness of Soft Labels

Abstract

Data-efficient deep learning is an emerging and powerful branch of deep learning that focuses on minimizing the amount of labeled data required for training. Coreset selection is one such method, where the goal is to select a representative subset from the original dataset, which can achieve comparable generalization performance at a much lower computation and disk space overhead. Dataset Distillation (DD), another branch of data-efficient deep learning, achieves this goal through distilling a small synthetic dataset from the original dataset. While DD works exploit soft labels (probabilistic target labels instead of traditional one-hot labels), which have yielded significant improvement over hard labels, to the best of our knowledge, no such study exists for coreset selection. In this work, for the first time, we study the impact of soft labels on generalization accuracy for the image classification task for various coreset selection algorithms. While soft labels improve the performance of all the methods, surprisingly, random selection with soft labels performs on par or better than existing coreset selection approaches. Our findings suggest that future coreset algorithms should benchmark against random selection with soft labels as an important baseline.

Cite

Text

Mohanty et al. "Rethinking Coreset Selection: The Surprising Effectiveness of Soft Labels." Transactions on Machine Learning Research, 2026.

Markdown

[Mohanty et al. "Rethinking Coreset Selection: The Surprising Effectiveness of Soft Labels." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/mohanty2026tmlr-rethinking/)

BibTeX

@article{mohanty2026tmlr-rethinking,
  title     = {{Rethinking Coreset Selection: The Surprising Effectiveness of Soft Labels}},
  author    = {Mohanty, Saumyaranjan and Vattivella, Deexitha and Mopuri, Konda Reddy},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/mohanty2026tmlr-rethinking/}
}