Interaction-Aware Dynamic 3D Gaze Estimation in Videos

Abstract

Human gaze in in-the-wild and outdoor human activities is a continuous and dynamic process that is driven by the anatomical eye movements such as fixations, saccades and smooth pursuit. However, learning gaze dynamics in videos remains as a challenging task as annotating human gaze in videos is labor-expensive. In this paper, we propose a novel method for dynamic 3D gaze estimation in videos by utilizing the human interaction labels. Our model contains a temporal gaze estimator which is built upon Autoregressive Transformer structures. Besides, our model learns the spatial relationship of gaze among multiple subjects, by constructing a Human Interaction Graph from predicted gaze and update the gaze feature with a structure-aware Transformer. Our model predict future gaze conditioned on historical gaze and the gaze interactions in an autoregressive manner. We propose a multi-state training algorithm to alternately update the Interaction module and dynamic gaze estimation module, when training on a mixture of labeled and unlabeled sequences. We show significant improvements in both within-domain gaze estimation accuracy and cross-domain generalization on the physically-unconstrained gaze estimation benchmark.

Cite

Text

Kuang et al. "Interaction-Aware Dynamic 3D Gaze Estimation in Videos." NeurIPS 2023 Workshops: Gaze_Meets_ML, 2023.

Markdown

[Kuang et al. "Interaction-Aware Dynamic 3D Gaze Estimation in Videos." NeurIPS 2023 Workshops: Gaze_Meets_ML, 2023.](https://mlanthology.org/neuripsw/2023/kuang2023neuripsw-interactionaware/)

BibTeX

@inproceedings{kuang2023neuripsw-interactionaware,
  title     = {{Interaction-Aware Dynamic 3D Gaze Estimation in Videos}},
  author    = {Kuang, Chenyi and Kephart, Jeffrey O. and Ji, Qiang},
  booktitle = {NeurIPS 2023 Workshops: Gaze_Meets_ML},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/kuang2023neuripsw-interactionaware/}
}