Towards Unsupervised Eye-Region Segmentation for Eye Tracking

Abstract

Finding the eye and parsing out the parts (e.g. pupil and iris) is a key prerequisite for image-based eye tracking, which has become an indispensable module in today’s head-mounted VR/AR devices. However, a typical route for training a segmenter requires tedious hand-labeling. In this work, we explore an unsupervised way. First, we utilize priors of human eye and extract signals from the image to establish rough clues indicating the eye-region structure. Upon these sparse and noisy clues, a segmentation network is trained to gradually identify the precise area for each part. To achieve accurate parsing of the eye-region, we first leverage the pretrained foundation model Segment Anything (SAM) in an automatic way to refine the eye indications. Then, the learning process is designed in an end-to-end manner following progressive and prior-aware principle. Experiments show that our unsupervised approach can easily achieve 90% (the pupil and iris) and 85% (the whole eye-region) of the performances under supervised learning.

Cite

Text

Deng et al. "Towards Unsupervised Eye-Region Segmentation for Eye Tracking." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91989-3_13

Markdown

[Deng et al. "Towards Unsupervised Eye-Region Segmentation for Eye Tracking." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/deng2024eccvw-unsupervised/) doi:10.1007/978-3-031-91989-3_13

BibTeX

@inproceedings{deng2024eccvw-unsupervised,
  title     = {{Towards Unsupervised Eye-Region Segmentation for Eye Tracking}},
  author    = {Deng, Jiangfan and Jia, Zhuang and Wang, Zhaoxue and Long, Xiang and Du, Daniel K.},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {199-213},
  doi       = {10.1007/978-3-031-91989-3_13},
  url       = {https://mlanthology.org/eccvw/2024/deng2024eccvw-unsupervised/}
}