Object-Aware Gaze Target Detection

Abstract

Gaze target detection aims to predict the image location where the person is looking and the probability that a gaze is out of the scene. Several works have tackled this task by regressing a gaze heatmap centered on the gaze location, however, they overlooked decoding the relationship between the people and the gazed objects. This paper proposes a Transformer-based architecture that automatically detects objects (including heads) in the scene to build associations between every head and the gazed-head/object, resulting in a comprehensive, explainable gaze analysis composed of: gaze target area, gaze pixel point, the class and the image location of the gazed-object. Upon evaluation of the in-the-wild benchmarks, our method achieves state-of-the-art results on all metrics (up to 2.91% gain in AUC, 50% reduction in gaze distance, and 9% gain in out-of-frame average precision) for gaze target detection and 11-13% improvement in average precision for the classification and the localization of the gazed-objects. The code of the proposed method is publicly available.

Cite

Text

Tonini et al. "Object-Aware Gaze Target Detection." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01998

Markdown

[Tonini et al. "Object-Aware Gaze Target Detection." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/tonini2023iccv-objectaware/) doi:10.1109/ICCV51070.2023.01998

BibTeX

@inproceedings{tonini2023iccv-objectaware,
  title     = {{Object-Aware Gaze Target Detection}},
  author    = {Tonini, Francesco and Dall'Asen, Nicola and Beyan, Cigdem and Ricci, Elisa},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {21860-21869},
  doi       = {10.1109/ICCV51070.2023.01998},
  url       = {https://mlanthology.org/iccv/2023/tonini2023iccv-objectaware/}
}