SUN-Spot: An RGB-D Dataset with Spatial Referring Expressions

Abstract

We introduce a new dataset, SUN-Spot, for localizing objects using spatial referring expressions (REs). SUN-Spot is the only RE dataset which uses RGB-D images. It also contains a greater average number of spatial prepositions and more cluttered scenes than previous RE datasets. Using a simple baseline, we show that including a depth channel in RE models can improve performance on both generation and comprehension.

Cite

Text

Mauceri et al. "SUN-Spot: An RGB-D Dataset with Spatial Referring Expressions." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00236

Markdown

[Mauceri et al. "SUN-Spot: An RGB-D Dataset with Spatial Referring Expressions." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/mauceri2019iccvw-sunspot/) doi:10.1109/ICCVW.2019.00236

BibTeX

@inproceedings{mauceri2019iccvw-sunspot,
  title     = {{SUN-Spot: An RGB-D Dataset with Spatial Referring Expressions}},
  author    = {Mauceri, Cecilia and Palmer, Martha and Heckman, Christoffer},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {1883-1886},
  doi       = {10.1109/ICCVW.2019.00236},
  url       = {https://mlanthology.org/iccvw/2019/mauceri2019iccvw-sunspot/}
}