Leveraging Prior Knowledge of Diffusion Model for Person Search

Abstract

Person search aims to jointly perform person detection and re-identification by localizing and identifying a query person within a gallery of uncropped scene images. Existing methods predominantly utilize ImageNet pre-trained backbones, which may be suboptimal for capturing the complex spatial context and fine-grained identity cues necessary for person search. Moreover, they rely on a shared backbone feature for both person detection and re-identification, leading to suboptimal features due to conflicting optimization objectives. In this paper, we propose DiffPS (Diffusion Prior Knowledge for Person Search), a novel framework that leverages a pre-trained diffusion model while eliminating the optimization conflict between two sub-tasks. We analyze key properties of diffusion priors and propose three specialized modules: (i) Diffusion-Guided Region Proposal Network (DGRPN) for enhanced person localization, (ii) Multi-Scale Frequency Refinement Network (MSFRN) to mitigate shape bias, and (iii) Semantic-Adaptive Feature Aggregation Network (SFAN) to leverage text-aligned diffusion features. DiffPS sets a new state-of-the-art on CUHK-SYSU and PRW.

Cite

Text

Kim et al. "Leveraging Prior Knowledge of Diffusion Model for Person Search." International Conference on Computer Vision, 2025.

Markdown

[Kim et al. "Leveraging Prior Knowledge of Diffusion Model for Person Search." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/kim2025iccv-leveraging/)

BibTeX

@inproceedings{kim2025iccv-leveraging,
  title     = {{Leveraging Prior Knowledge of Diffusion Model for Person Search}},
  author    = {Kim, Giyeol and Yang, Sooyoung and Oh, Jihyong and Kang, Myungjoo and Eom, Chanho},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {20301-20312},
  url       = {https://mlanthology.org/iccv/2025/kim2025iccv-leveraging/}
}