You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

Abstract

Human-centric perception (detection, segmentation, pose estimation, and attribute analysis) is a long-standing problem for computer vision. This paper introduces a unified and versatile framework (HQNet) for single-stage multi-person multi-task human-centric perception (HCP). Our approach centers on learning a unified human query representation, denoted as Human Query, which captures intricate instance-level features for individual persons and disentangles complex multi-person scenarios. Although different HCP tasks have been well-studied individually, single-stage multi-task learning of HCP tasks has not been fully exploited in the literature due to the absence of a comprehensive benchmark dataset. To address this gap, we propose COCO-UniHuman benchmark to enable model development and comprehensive evaluation. Experimental results demonstrate the proposed method’s state-of-the-art performance among multi-task HCP models and its competitive performance compared to task-specific HCP models. Moreover, our experiments underscore Human Query’s adaptability to new HCP tasks, thus demonstrating its robust generalization capability. Codes and data are available at https://github.com/lishuhuai527/COCO-UniHuman.

Cite

Text

Jin et al. "You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72649-1_8

Markdown

[Jin et al. "You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/jin2024eccv-you/) doi:10.1007/978-3-031-72649-1_8

BibTeX

@inproceedings{jin2024eccv-you,
  title     = {{You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception}},
  author    = {Jin, Sheng and Li, Shuhuai and Li, Tong and Liu, Wentao and Qian, Chen and Luo, Ping},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72649-1_8},
  url       = {https://mlanthology.org/eccv/2024/jin2024eccv-you/}
}