MTGLS: Multi-Task Gaze Estimation with Limited Supervision

Abstract

Robust gaze estimation is a challenging task, even for deep CNNs, due to the non-availability of large-scale labeled data. Moreover, gaze annotation is a time-consuming process and requires specialized hardware setups. We propose MTGLS: a Multi-Task Gaze estimation framework with Limited Supervision, which leverages abundantly available non-annotated facial image data. MTGLS distills knowledge from off-the-shelf facial image analysis models, and learns strong feature representations of human eyes, guided by three complementary auxiliary signals: (a) the line of sight of the pupil (i.e. pseudo-gaze) defined by the localized facial landmarks, (b) the head-pose given by Euler angles, and (c) the orientation of the eye patch (left/right eye). To overcome inherent noise in the supervisory signals, MTGLS further incorporates a noise distribution modelling approach. Our experimental results show that MTGLS learns highly generalized representations which consistently perform well on a range of datasets. Our proposed framework outperforms the unsupervised state-of-the-art on CAVE (by approx. 6.43%) and even supervised state-of-the-art methods on Gaze360 (by approx. 6.59%) datasets.

Cite

Text

Ghosh et al. "MTGLS: Multi-Task Gaze Estimation with Limited Supervision." Winter Conference on Applications of Computer Vision, 2022.

Markdown

[Ghosh et al. "MTGLS: Multi-Task Gaze Estimation with Limited Supervision." Winter Conference on Applications of Computer Vision, 2022.](https://mlanthology.org/wacv/2022/ghosh2022wacv-mtgls/)

BibTeX

@inproceedings{ghosh2022wacv-mtgls,
  title     = {{MTGLS: Multi-Task Gaze Estimation with Limited Supervision}},
  author    = {Ghosh, Shreya and Hayat, Munawar and Dhall, Abhinav and Knibbe, Jarrod},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2022},
  pages     = {3223-3234},
  url       = {https://mlanthology.org/wacv/2022/ghosh2022wacv-mtgls/}
}