Self-Supervised Learning for Visual Relationship Detection Through Masked Bounding Box Reconstruction
Abstract
We present a novel self-supervised approach for representation learning, particularly for the task of Visual Relationship Detection (VRD). Motivated by the effectiveness of Masked Image Modeling (MIM), we propose Masked Bounding Box Reconstruction (MBBR), a variation of MIM where a percentage of the entities/objects within a scene are masked and subsequently reconstructed based on the unmasked objects. The core idea is that, through object-level masked modeling, the network learns context-aware representations that capture the interaction of objects within a scene and thus are highly predictive of visual object relationships. We extensively evaluate learned representations, both qualitatively and quantitatively, in a few-shot setting and demonstrate the efficacy of MBBR for learning robust visual representations, particularly tailored for VRD. The proposed method is able to surpass state-of-the-art VRD methods on the Predicate Detection (PredDet) evaluation setting, using only a few annotated samples. We make our code available at https://github.com/deeplab-ai/SelfSupervisedVRD.
Cite
Text
Anastasakis et al. "Self-Supervised Learning for Visual Relationship Detection Through Masked Bounding Box Reconstruction." Winter Conference on Applications of Computer Vision, 2024.Markdown
[Anastasakis et al. "Self-Supervised Learning for Visual Relationship Detection Through Masked Bounding Box Reconstruction." Winter Conference on Applications of Computer Vision, 2024.](https://mlanthology.org/wacv/2024/anastasakis2024wacv-selfsupervised/)BibTeX
@inproceedings{anastasakis2024wacv-selfsupervised,
title = {{Self-Supervised Learning for Visual Relationship Detection Through Masked Bounding Box Reconstruction}},
author = {Anastasakis, Zacharias and Mallis, Dimitrios and Diomataris, Markos and Alexandridis, George and Kollias, Stefanos and Pitsikalis, Vassilis},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2024},
pages = {1206-1215},
url = {https://mlanthology.org/wacv/2024/anastasakis2024wacv-selfsupervised/}
}