CLAREL: Classification via Retrieval Loss for Zero-Shot Learning

Abstract

We address the problem of learning cross-modal representations. We propose an instance-based deep metric learning approach in joint visual and textual space. The key novelty of this paper is that it shows that using per-image semantic supervision leads to substantial improvement in zero-shot performance over using class-only supervision. We also provide a probabilistic justification and empirical validation for a metric rescaling approach to balance the seen/unseen accuracy in the GZSL task. We evaluate our approach on two fine-grained zero-shot datasets: cub and flowers.

Cite

Text

Oreshkin et al. "CLAREL: Classification via Retrieval Loss for Zero-Shot Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00466

Markdown

[Oreshkin et al. "CLAREL: Classification via Retrieval Loss for Zero-Shot Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/oreshkin2020cvprw-clarel/) doi:10.1109/CVPRW50498.2020.00466

BibTeX

@inproceedings{oreshkin2020cvprw-clarel,
  title     = {{CLAREL: Classification via Retrieval Loss for Zero-Shot Learning}},
  author    = {Oreshkin, Boris N. and Rostamzadeh, Negar and Pinheiro, Pedro O. and Pal, Christopher J.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2020},
  pages     = {3989-3993},
  doi       = {10.1109/CVPRW50498.2020.00466},
  url       = {https://mlanthology.org/cvprw/2020/oreshkin2020cvprw-clarel/}
}