Attention Flow: End-to-End Joint Attention Estimation

Abstract

This paper addresses the problem of understanding joint attention in third-person social scene videos. Joint attention is the shared gaze behaviour of two or more individuals on an object or an area of interest and has a wide range of applications such as human-computer interaction, educational assessment, treatment of patients with attention disorders, and many more. Our method, Attention Flow, learns joint attention in an end-to-end fashion by using saliency-augmented attention maps and two novel convolutional attention mechanisms that determine to select relevant features and improve joint attention localization. We compare the effect of saliency maps and attention mechanisms and report quantitative and qualitative results on the detection and localization of joint attention in the VideoCoAtt dataset, which contains complex social scenes.

Cite

Text

Sumer et al. "Attention Flow: End-to-End Joint Attention Estimation." Winter Conference on Applications of Computer Vision, 2020.

Markdown

[Sumer et al. "Attention Flow: End-to-End Joint Attention Estimation." Winter Conference on Applications of Computer Vision, 2020.](https://mlanthology.org/wacv/2020/sumer2020wacv-attention/)

BibTeX

@inproceedings{sumer2020wacv-attention,
  title     = {{Attention Flow: End-to-End Joint Attention Estimation}},
  author    = {Sumer, Omer and Gerjets, Peter and Trautwein, Ulrich and Kasneci, Enkelejda},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2020},
  url       = {https://mlanthology.org/wacv/2020/sumer2020wacv-attention/}
}