Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos

Abstract

This work aims to develop a computer-vision technique for understanding objects jointly attended by a group of people during social interactions. As a key tool to discover such objects of joint attention, we rely on a collection of wearable eye-tracking cameras that provide a first-person video of interaction scenes and points-of-gaze data of interacting parties. Technically, we propose a hierarchical conditional random field-based model that can 1) localize events of joint attention temporally and 2) segment objects of joint attention spatially. We show that by alternating these two procedures, objects of joint attention can be discovered reliably even from cluttered scenes and noisy points-of-gaze data. Experimental results demonstrate that our approach outperforms several state-of-the-art methods for co-segmentation and joint attention discovery.

Cite

Text

Huang et al. "Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2017. doi:10.1109/ICCVW.2017.273

Markdown

[Huang et al. "Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2017.](https://mlanthology.org/iccvw/2017/huang2017iccvw-temporal/) doi:10.1109/ICCVW.2017.273

BibTeX

@inproceedings{huang2017iccvw-temporal,
  title     = {{Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos}},
  author    = {Huang, Yifei and Cai, Minjie and Kera, Hiroshi and Yonetani, Ryo and Higuchi, Keita and Sato, Yoichi},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2017},
  pages     = {2313-2321},
  doi       = {10.1109/ICCVW.2017.273},
  url       = {https://mlanthology.org/iccvw/2017/huang2017iccvw-temporal/}
}