Grafit: Learning Fine-Grained Image Representations with Coarse Labels

Abstract

This paper tackles the problem of learning a finer representation than the one provided by training labels. This enables fine-grained category retrieval of images in a collection annotated with coarse labels only. Our network is learned with a nearest-neighbor classifier objective, and an instance loss inspired by self-supervised learning. By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods. Our strategy outperforms all competing methods for retrieving or classifying images at a finer granularity than that available at train time. It also improves the accuracy for transfer learning tasks to fine-grained datasets.

Cite

Text

Touvron et al. "Grafit: Learning Fine-Grained Image Representations with Coarse Labels." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00091

Markdown

[Touvron et al. "Grafit: Learning Fine-Grained Image Representations with Coarse Labels." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/touvron2021iccv-grafit/) doi:10.1109/ICCV48922.2021.00091

BibTeX

@inproceedings{touvron2021iccv-grafit,
  title     = {{Grafit: Learning Fine-Grained Image Representations with Coarse Labels}},
  author    = {Touvron, Hugo and Sablayrolles, Alexandre and Douze, Matthijs and Cord, Matthieu and Jégou, Hervé},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {874-884},
  doi       = {10.1109/ICCV48922.2021.00091},
  url       = {https://mlanthology.org/iccv/2021/touvron2021iccv-grafit/}
}