SapiensID: Foundation for Human Recognition

Abstract

Existing human recognition systems often rely on separate, specialized models for face and body analysis, limiting their effectiveness in real-world scenarios where pose, visibility, and context vary widely. This paper introduces SapiensID, a unified model that bridges this gap, achieving robust performance across diverse settings. SapiensID introduces (i) Retina Patch (RP), a dynamic patch generation scheme that adapts to subject scale and ensures consistent tokenization of regions of interest; (ii) Semantic Attention Head (SAH), an attention mechanism that learns pose-invariant representations by pooling features around key body parts; and (iii) a masked recognition model (MRM) that learns from variable token length. To facilitate training, we introduce WebBody4M, a large-scale dataset capturing diverse poses and scale variations. Extensive experiments demonstrate that SapiensID achieves state-of-the-art results on various body ReID benchmarks, outperforming specialized models in both short-term and long-term scenarios while remaining competitive with dedicated face recognition systems. Furthermore, SapiensID establishes a strong baseline for the newly introduced challenge of Cross Pose-Scale ReID, demonstrating its ability to generalize to complex, real-world conditions.The dataset, code and models will be released.

Cite

Text

Kim et al. "SapiensID: Foundation for Human Recognition." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01301

Markdown

[Kim et al. "SapiensID: Foundation for Human Recognition." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/kim2025cvpr-sapiensid/) doi:10.1109/CVPR52734.2025.01301

BibTeX

@inproceedings{kim2025cvpr-sapiensid,
  title     = {{SapiensID: Foundation for Human Recognition}},
  author    = {Kim, Minchul and Ye, Dingqiang and Su, Yiyang and Liu, Feng and Liu, Xiaoming},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {13937-13947},
  doi       = {10.1109/CVPR52734.2025.01301},
  url       = {https://mlanthology.org/cvpr/2025/kim2025cvpr-sapiensid/}
}