HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes

Abstract

The aspiration for artificial general intelligence, fueled by the rapid progress of multimodal understanding, demands models to understand humans in diverse and complex scenarios, as humans manifests intelligence and embody the world. We propose HumanPCR, an evaluation suite for probing MLLMs’ capacity in human-centric visual contexts across three hierarchical levels: Perception, Comprehension, and Reasoning (denoted by Human-P, Human-C, and Human-R, respectively). Human-P and Human-C consist of over 6,000 multiple-choice questions evaluating 34 fine-grained tasks covering 9 essential dimensions. Human-R presents a manually curated challenging video reasoning test that requires integrating multiple visual evidence, proactively extracting implicit context beyond question cues, and applying human-like expertise. Each question includes human-annotated Chain-of-Thought (CoT) rationales with key visual evidence to support further research. Extensive evaluations on over 30 state-of-the-art models exhibit significant challenges in human-centric visual understanding, particularly in tasks involving detailed space perception, temporal understanding, and mind modeling. The analysis of Human-R further exposes a critical failure in reasoning: models struggle to proactively gather necessary visual evidence, instead showing a faulty reliance on query-prompted cues, with advanced techniques offering only marginal gains. We hope HumanPCR and our findings will advance the development, evaluation, and human-centric applications of multimodal models.

Cite

Text

Li et al. "HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-humanpcr/)

BibTeX

@inproceedings{li2026iclr-humanpcr,
  title     = {{HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes}},
  author    = {Li, Keliang and Shen, Hongze and Shi, Hao and Hou, RuiBing and Chang, Hong and Huang, Jie and Jia, Chenghao and Wang, Wen and Wu, Yiling and Jiang, Dongmei and Shan, Shiguang and Chen, Xilin},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-humanpcr/}
}