Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning
Abstract
We study a limited label problem and present a novel approach to Single-Positive Multi-label Learning. In the multi-label learning setting, a model learns to predict multiple labels or categories for a single input image. This contrasts with standard multi-class image classification, where the task is to predict a single label from many possible labels for an image. Single-Positive Multi-label Learning specifically considers learning to predict multiple labels when there is only one annotation per image in the training data. Multi-label learning is a more natural task than single-label learning because real-world data often involves instances belonging to multiple categories simultaneously; however, most computer vision datasets contain single labels due to the inherent complexity and cost of collecting multiple high-quality annotations per image. We propose a novel approach called Vision-Language Pseudo-Labeling, which uses a vision-language model, CLIP, to suggest strong positive and negative pseudo-labels. The experiment performance shows the effectiveness of the proposed model. Our code and data will be made publicly available at https://github.com/mvrl/VLPL.
Cite
Text
Xing et al. "Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00776Markdown
[Xing et al. "Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/xing2024cvprw-visionlanguage/) doi:10.1109/CVPRW63382.2024.00776BibTeX
@inproceedings{xing2024cvprw-visionlanguage,
title = {{Vision-Language Pseudo-Labels for Single-Positive Multi-Label Learning}},
author = {Xing, Xin and Xiong, Zhexiao and Stylianou, Abby and Sastry, Srikumar and Gong, Liyu and Jacobs, Nathan},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {7799-7808},
doi = {10.1109/CVPRW63382.2024.00776},
url = {https://mlanthology.org/cvprw/2024/xing2024cvprw-visionlanguage/}
}