Focus Your Attention When Few-Shot Classification
Abstract
Since many pre-trained vision transformers emerge and provide strong representation for various downstream tasks, we aim to adapt them to few-shot image classification tasks in this work. The input images typically contain multiple entities. The model may not focus on the class-related entities for the current few-shot task, even with fine-tuning on support samples, and the noise information from the class-independent ones harms performance. To this end, we first propose a method that uses the attention and gradient information to automatically locate the positions of key entities, denoted as position prompts, in the support images. Then we employ the cross-entropy loss between their many-hot presentation and the attention logits to optimize the model to focus its attention on the key entities during fine-tuning. This ability then can generalize to the query samples. Our method is applicable to different vision transformers (e.g., columnar or pyramidal ones), and also to different pre-training ways (e.g., single-modal or vision-language pre-training). Extensive experiments show that our method can improve the performance of full or parameter-efficient fine-tuning methods on few-shot tasks. Code is available at https://github.com/Haoqing-Wang/FORT.
Cite
Text
Wang et al. "Focus Your Attention When Few-Shot Classification." Neural Information Processing Systems, 2023.Markdown
[Wang et al. "Focus Your Attention When Few-Shot Classification." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/wang2023neurips-focus-a/)BibTeX
@inproceedings{wang2023neurips-focus-a,
title = {{Focus Your Attention When Few-Shot Classification}},
author = {Wang, Haoqing and Jie, Shibo and Deng, Zhihong},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/wang2023neurips-focus-a/}
}