TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias
Abstract
We identify a critical bias in contemporary CLIP-based models, which we denote as single tag bias. This bias manifests as a disproportionate focus on a singular tag (word) while neglecting other pertinent tags, stemming from CLIP embeddings prioritizing one specific tag in image-text relationships. In this paper, we introduce a novel two-step fine-tuning approach, Text-Tag Self-Distillation (TTD), to address this challenge. We first extract all image-relevant tags from text based on their similarity to the nearest pixels. Then, we distill a combined mask containing the extracted tags’ content to a text-derived mask. This approach ensures the unbiased image-text alignment of the CLIP-based models using only image-text pairs without necessitating additional supervision. Our technique demonstrates model-agnostic improvements in multi-tag classification and segmentation tasks, surpassing competing methods that rely on external resources. The code and data are available at https://github.com/shjo-april/ TTD.
Cite
Text
Jo et al. "TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73004-7_20Markdown
[Jo et al. "TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/jo2024eccv-ttd/) doi:10.1007/978-3-031-73004-7_20BibTeX
@inproceedings{jo2024eccv-ttd,
title = {{TTD: Text-Tag Self-Distillation Enhancing Image-Text Alignment in CLIP to Alleviate Single Tag Bias}},
author = {Jo, Sanghyun and Ryu, Soohyun and Kim, Sungyub and Yang, Eunho and Kim, Kyungsu},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73004-7_20},
url = {https://mlanthology.org/eccv/2024/jo2024eccv-ttd/}
}