Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling

Abstract

Predicting the target of visual search from human gaze data is a challenging problem. In contrast to previous work that focused on predicting specific instances of search targets, we propose the first approach to predict a target's category and attributes. However, state-of-the-art models for categorical recognition require large amounts of training data, which is prohibitive for gaze data. We thus propose a novel Gaze Pooling Layer that integrates gaze information and CNN-based features by an attention mechanism - incorporating both spatial and temporal aspects of gaze behaviour. We show that our approach can leverage pre-trained CNN architectures, thus eliminating the need for expensive joint data collection of image and gaze data. We demonstrate the effectiveness of our method on a new 14 participant dataset, and indicate directions for future research in the gaze-based prediction of mental states.

Cite

Text

Sattar et al. "Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling." IEEE/CVF International Conference on Computer Vision Workshops, 2017. doi:10.1109/ICCVW.2017.322

Markdown

[Sattar et al. "Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling." IEEE/CVF International Conference on Computer Vision Workshops, 2017.](https://mlanthology.org/iccvw/2017/sattar2017iccvw-predicting/) doi:10.1109/ICCVW.2017.322

BibTeX

@inproceedings{sattar2017iccvw-predicting,
  title     = {{Predicting the Category and Attributes of Visual Search Targets Using Deep Gaze Pooling}},
  author    = {Sattar, Hosnieh and Bulling, Andreas and Fritz, Mario},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2017},
  pages     = {2740-2748},
  doi       = {10.1109/ICCVW.2017.322},
  url       = {https://mlanthology.org/iccvw/2017/sattar2017iccvw-predicting/}
}