Partially Does It: Towards Scene-Level FG-SBIR with Partial Input

Abstract

We scrutinise an important observation plaguing scene-level sketch research -- that a significant portion of scene sketches are "partial". A quick pilot study reveals: (i) a scene sketch does not necessarily contain all objects in the corresponding photo, due to the subjective holistic interpretation of scenes, (ii) there exists significant empty (white) regions as a result of object-level abstraction, and as a result, (iii) existing scene-level fine-grained sketch-based image retrieval methods collapse as scene sketches become more partial. To solve this "partial" problem, we advocate for a simple set-based approach using optimal transport (OT) to model cross-modal region associativity in a partially-aware fashion. Importantly, we improve upon OT to further account for holistic partialness by comparing intra-modal adjacency matrices. Our proposed method is not only robust to partial scene-sketches but also yields state-of-the-art performance on existing datasets.

Cite

Text

Chowdhury et al. "Partially Does It: Towards Scene-Level FG-SBIR with Partial Input." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00243

Markdown

[Chowdhury et al. "Partially Does It: Towards Scene-Level FG-SBIR with Partial Input." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/chowdhury2022cvpr-partially/) doi:10.1109/CVPR52688.2022.00243

BibTeX

@inproceedings{chowdhury2022cvpr-partially,
  title     = {{Partially Does It: Towards Scene-Level FG-SBIR with Partial Input}},
  author    = {Chowdhury, Pinaki Nath and Bhunia, Ayan Kumar and Gajjala, Viswanatha Reddy and Sain, Aneeshan and Xiang, Tao and Song, Yi-Zhe},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {2395-2405},
  doi       = {10.1109/CVPR52688.2022.00243},
  url       = {https://mlanthology.org/cvpr/2022/chowdhury2022cvpr-partially/}
}