SUN-Spot: An RGB-D Dataset with Spatial Referring Expressions
Abstract
We introduce a new dataset, SUN-Spot, for localizing objects using spatial referring expressions (REs). SUN-Spot is the only RE dataset which uses RGB-D images. It also contains a greater average number of spatial prepositions and more cluttered scenes than previous RE datasets. Using a simple baseline, we show that including a depth channel in RE models can improve performance on both generation and comprehension.
Cite
Text
Mauceri et al. "SUN-Spot: An RGB-D Dataset with Spatial Referring Expressions." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00236Markdown
[Mauceri et al. "SUN-Spot: An RGB-D Dataset with Spatial Referring Expressions." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/mauceri2019iccvw-sunspot/) doi:10.1109/ICCVW.2019.00236BibTeX
@inproceedings{mauceri2019iccvw-sunspot,
title = {{SUN-Spot: An RGB-D Dataset with Spatial Referring Expressions}},
author = {Mauceri, Cecilia and Palmer, Martha and Heckman, Christoffer},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {1883-1886},
doi = {10.1109/ICCVW.2019.00236},
url = {https://mlanthology.org/iccvw/2019/mauceri2019iccvw-sunspot/}
}