Adaptive Cross-Modal Few-Shot Learning
Abstract
Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. However, leveraging cross-modal information in a few-shot setting has yet to be explored. When the support from visual information is limited in few-shot image classification, semantic representations (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning. Based on this intuition, we design a model that is able to leverage visual and semantic features in the context of few-shot classification. We propose an adaptive mechanism that is able to effectively combine both modalities conditioned on categories. Through a series of experiments, we show that our method boosts the performance of metric-based approaches by effectively exploiting language structure. Using this extra modality, our model bypass current unimodal state-of-the-art methods by a large margin on miniImageNet. The improvement in performance is particularly large when the number of shots are small.
Cite
Text
Xing et al. "Adaptive Cross-Modal Few-Shot Learning." ICLR 2019 Workshops: LLD, 2019.Markdown
[Xing et al. "Adaptive Cross-Modal Few-Shot Learning." ICLR 2019 Workshops: LLD, 2019.](https://mlanthology.org/iclrw/2019/xing2019iclrw-adaptive/)BibTeX
@inproceedings{xing2019iclrw-adaptive,
title = {{Adaptive Cross-Modal Few-Shot Learning}},
author = {Xing, Chen and Rostamzadeh, Negar and Oreshkin, Boris N. and Pinheiro, Pedro O.},
booktitle = {ICLR 2019 Workshops: LLD},
year = {2019},
url = {https://mlanthology.org/iclrw/2019/xing2019iclrw-adaptive/}
}