One Embedding to Predict Them All: Visible and Thermal Universal Face Representations for Soft Biometric Estimation via Vision Transformers

Abstract

Human faces encode a vast amount of information including not only uniquely distinctive features of the individual but also demographic information such as a person’s age, gender, and weight. Such information is referred to as soft-biometrics, which are physical, behavioral or adhered human characteristics, classifiable in pre–defined human compliant categories. As we often say ’one look is worth a thousand words’. Vision Transformers have emerged as a powerful deep learning architecture able to achieve accurate classifications for different computer vision tasks, but these models have not been yet applied to soft-biometrics. In this work, we propose the Bidirectional Encoder Face representation from image Transformers (BEFiT), a model that leverages the multi-attention mechanisms to capture local and global features and produce a multi-purpose face embedding. This unique embedding enables the estimation of different demographics without having to re-train the model for each soft-biometric trait, ensuring high efficiency without compromising accuracy. Our approach explores the use of visible and thermal images to achieve powerful face embedding in different light spectra. We demonstrate that the BEFiT embeddings can capture essential information for gender, age, and weight estimation, surpassing the performance of dedicated deep learning structures for the estimation of a single soft biometric trait. The code of BEFiT implementation is publicly available1

Cite

Text

Mirabet-Herranz et al. "One Embedding to Predict Them All: Visible and Thermal Universal Face Representations for Soft Biometric Estimation via Vision Transformers." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00157

Markdown

[Mirabet-Herranz et al. "One Embedding to Predict Them All: Visible and Thermal Universal Face Representations for Soft Biometric Estimation via Vision Transformers." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/mirabetherranz2024cvprw-one/) doi:10.1109/CVPRW63382.2024.00157

BibTeX

@inproceedings{mirabetherranz2024cvprw-one,
  title     = {{One Embedding to Predict Them All: Visible and Thermal Universal Face Representations for Soft Biometric Estimation via Vision Transformers}},
  author    = {Mirabet-Herranz, Nélida and Galdi, Chiara and Dugelay, Jean-Luc},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {1500-1509},
  doi       = {10.1109/CVPRW63382.2024.00157},
  url       = {https://mlanthology.org/cvprw/2024/mirabetherranz2024cvprw-one/}
}