Using Sentences as Semantic Representations in Large Scale Zero-Shot Learning
Abstract
Zero-shot learning aims to recognize instances of unseen classes, for which no visual instance is available during training, by learning multimodal relations between samples from seen classes and corresponding class semantic representations. These class representations usually consist of either attributes, which do not scale well to large datasets, or word embeddings, which lead to poorer performance. A good trade-off could be to employ short sentences in natural language as class descriptions. We explore different solutions to use such short descriptions in a ZSL setting and show that while simple methods cannot achieve very good results with sentences alone, a combination of usual word embeddings and sentences can significantly outperform current state-of-the-art.
Cite
Text
Le Cacheux et al. "Using Sentences as Semantic Representations in Large Scale Zero-Shot Learning." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-66415-2_42Markdown
[Le Cacheux et al. "Using Sentences as Semantic Representations in Large Scale Zero-Shot Learning." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/cacheux2020eccvw-using/) doi:10.1007/978-3-030-66415-2_42BibTeX
@inproceedings{cacheux2020eccvw-using,
title = {{Using Sentences as Semantic Representations in Large Scale Zero-Shot Learning}},
author = {Le Cacheux, Yannick and Le Borgne, Hervé and Crucianu, Michel},
booktitle = {European Conference on Computer Vision Workshops},
year = {2020},
pages = {641-645},
doi = {10.1007/978-3-030-66415-2_42},
url = {https://mlanthology.org/eccvw/2020/cacheux2020eccvw-using/}
}