Modality-Aware Representation Learning for Zero-Shot Sketch-Based Image Retrieval
Abstract
Zero-shot learning offers an efficient solution for a machine learning model to treat unseen categories, avoiding exhaustive data collection. Zero-shot Sketch-based Image Retrieval (ZS-SBIR) simulates real-world scenarios where it is hard and costly to collect paired sketch-photo samples. We propose a novel framework that indirectly aligns sketches and photos by contrasting them through texts, removing the necessity of access to sketch-photo pairs. With an explicit modality encoding learned from data, our approach disentangles modality-agnostic semantics from modality-specific information, bridging the modality gap and enabling effective cross-modal content retrieval within a joint latent space. From comprehensive experiments, we verify the efficacy of the proposed model on ZS-SBIR, and it can be also applied to generalized and fine-grained settings.
Cite
Text
Lyou et al. "Modality-Aware Representation Learning for Zero-Shot Sketch-Based Image Retrieval." Winter Conference on Applications of Computer Vision, 2024.Markdown
[Lyou et al. "Modality-Aware Representation Learning for Zero-Shot Sketch-Based Image Retrieval." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/lyou2024wacv-modalityaware/)BibTeX
@inproceedings{lyou2024wacv-modalityaware,
title = {{Modality-Aware Representation Learning for Zero-Shot Sketch-Based Image Retrieval}},
author = {Lyou, Eunyi and Lee, Doyeon and Kim, Jooeun and Lee, Joonseok},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2024},
pages = {5646-5655},
url = {https://mlanthology.org/wacv/2024/lyou2024wacv-modalityaware/}
}