Dual Attention Guided Gaze Target Detection in the Wild

Abstract

Gaze target detection aims to infer where each person in a scene is looking. Existing works focus on 2D gaze and 2D saliency, but fail to exploit 3D contexts. In this work, we propose a three-stage method to simulate the human gaze inference behavior in 3D space. In the first stage, we introduce a coarse-to-fine strategy to robustly estimate a 3D gaze orientation from the head. The predicted gaze is decomposed into a planar gaze on the image plane and a depth-channel gaze. In the second stage, we develop a Dual Attention Module (DAM), which takes the planar gaze to produce the filed of view and masks interfering objects regulated by depth information according to the depth-channel gaze. In the third stage, we use the generated dual attention as guidance to perform two sub-tasks: (1) identifying whether the gaze target is inside or out of the image; (2) locating the target if inside. Extensive experiments demonstrate that our approach performs favorably against state-of-the-art methods on GazeFollow and VideoAttentionTarget datasets.

Cite

Text

Fang et al. "Dual Attention Guided Gaze Target Detection in the Wild." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.01123

Markdown

[Fang et al. "Dual Attention Guided Gaze Target Detection in the Wild." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/fang2021cvpr-dual/) doi:10.1109/CVPR46437.2021.01123

BibTeX

@inproceedings{fang2021cvpr-dual,
  title     = {{Dual Attention Guided Gaze Target Detection in the Wild}},
  author    = {Fang, Yi and Tang, Jiapeng and Shen, Wang and Shen, Wei and Gu, Xiao and Song, Li and Zhai, Guangtao},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {11390-11399},
  doi       = {10.1109/CVPR46437.2021.01123},
  url       = {https://mlanthology.org/cvpr/2021/fang2021cvpr-dual/}
}