When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes
Abstract
Context is of fundamental importance to both human and machine vision; e.g., an object in the air is more likely to be an airplane than a pig. The rich notion of context incorporates several aspects including physics rules, statistical co-occurrences, and relative object sizes, among others. While previous work has focused on crowd-sourced out-of-context photographs from the web to study scene context, controlling the nature and extent of contextual violations has been a daunting task. Here we introduce a diverse, synthetic Out-of-Context Dataset (OCD) with fine-grained control over scene context. By leveraging a 3D simulation engine, we systematically control the gravity, object co-occurrences and relative sizes across 36 object categories in a virtual household environment. We conducted a series of experiments to gain insights into the impact of contextual cues on both human and machine vision using OCD. We conducted psychophysics experiments to establish a human benchmark for out-of-context recognition and then compared it with state-of-the-art computer vision models to quantify the gap between the two. We propose a context-aware recognition transformer model, fusing object and contextual information via multi-head attention. Our model captures useful information for contextual reasoning, enabling human-level performance and better robustness in out-of-context conditions compared to baseline models across OCD and other out-of-context datasets. All source code and data are publicly available at https://github.com/kreimanlab/WhenPigsFlyContext
Cite
Text
Bomatter et al. "When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00032Markdown
[Bomatter et al. "When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/bomatter2021iccv-pigs/) doi:10.1109/ICCV48922.2021.00032BibTeX
@inproceedings{bomatter2021iccv-pigs,
title = {{When Pigs Fly: Contextual Reasoning in Synthetic and Natural Scenes}},
author = {Bomatter, Philipp and Zhang, Mengmi and Karev, Dimitar and Madan, Spandan and Tseng, Claire and Kreiman, Gabriel},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {255-264},
doi = {10.1109/ICCV48922.2021.00032},
url = {https://mlanthology.org/iccv/2021/bomatter2021iccv-pigs/}
}