Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning

Abstract

Deep learning in general domains has constantly been extended to domain-specific tasks requiring the recognition of fine-grained characteristics. However, real-world applications for fine-grained tasks suffer from two challenges: a high reliance on expert knowledge for annotation and necessity of a versatile model for various downstream tasks in a specific domain (e.g., prediction of categories, bounding boxes, or pixel-wise annotations). Fortunately, the recent self-supervised learning (SSL) is a promising approach to pretrain a model without annotations, serving as an effective initialization for any downstream tasks. Since SSL does not rely on the presence of annotation, in general, it utilizes the large-scale unlabeled dataset, referred to as an open-set. In this sense, we introduce a novel Open-Set Self-Supervised Learning problem under the assumption that a large-scale unlabeled open-set is available, as well as the fine-grained target dataset, during a pretraining phase. In our problem setup, it is crucial to consider the distribution mismatch between the open-set and target dataset. Hence, we propose SimCore algorithm to sample a coreset, the subset of an open-set that has a minimum distance to the target dataset in the latent space. We demonstrate that SimCore significantly improves representation learning performance through extensive experimental settings, including eleven fine-grained datasets and seven open-sets in various downstream tasks.

Cite

Text

Kim et al. "Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00728

Markdown

[Kim et al. "Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/kim2023cvpr-coreset/) doi:10.1109/CVPR52729.2023.00728

BibTeX

@inproceedings{kim2023cvpr-coreset,
  title     = {{Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning}},
  author    = {Kim, Sungnyun and Bae, Sangmin and Yun, Se-Young},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {7537-7547},
  doi       = {10.1109/CVPR52729.2023.00728},
  url       = {https://mlanthology.org/cvpr/2023/kim2023cvpr-coreset/}
}