Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos
Abstract
This work aims to develop a computer-vision technique for understanding objects jointly attended by a group of people during social interactions. As a key tool to discover such objects of joint attention, we rely on a collection of wearable eye-tracking cameras that provide a first-person video of interaction scenes and points-of-gaze data of interacting parties. Technically, we propose a hierarchical conditional random field-based model that can 1) localize events of joint attention temporally and 2) segment objects of joint attention spatially. We show that by alternating these two procedures, objects of joint attention can be discovered reliably even from cluttered scenes and noisy points-of-gaze data. Experimental results demonstrate that our approach outperforms several state-of-the-art methods for co-segmentation and joint attention discovery.
Cite
Text
Huang et al. "Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2017. doi:10.1109/ICCVW.2017.273Markdown
[Huang et al. "Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2017.](https://mlanthology.org/iccvw/2017/huang2017iccvw-temporal/) doi:10.1109/ICCVW.2017.273BibTeX
@inproceedings{huang2017iccvw-temporal,
title = {{Temporal Localization and Spatial Segmentation of Joint Attention in Multiple First-Person Videos}},
author = {Huang, Yifei and Cai, Minjie and Kera, Hiroshi and Yonetani, Ryo and Higuchi, Keita and Sato, Yoichi},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2017},
pages = {2313-2321},
doi = {10.1109/ICCVW.2017.273},
url = {https://mlanthology.org/iccvw/2017/huang2017iccvw-temporal/}
}