Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos

Abstract

Video-based person re-identification aims to match pedestrians from video sequences across non-overlapping camera views. The key factor for video person re-identification is to effectively exploit both spatial and temporal clues from video sequences. In this work, we propose a novel Spatial-Temporal Correlation and Topology Learning framework (CTL) to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation. Specifically, CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body at multiple granularities as graph nodes. It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body. Moreover, a 3D graph convolution and a cross-scale graph convolution are designed, which facilitate direct cross-spacetime and cross-scale information propagation for capturing hierarchical spatial-temporal dependencies and structural information. By jointly performing the two convolutions, CTL effectively mines comprehensive clues that are complementary with appearance information to enhance representational capacity. Extensive experiments on two video benchmarks have demonstrated the effectiveness of the proposed method and the state-of-the-art performance.

Cite

Text

Liu et al. "Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00435

Markdown

[Liu et al. "Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/liu2021cvpr-spatialtemporal/) doi:10.1109/CVPR46437.2021.00435

BibTeX

@inproceedings{liu2021cvpr-spatialtemporal,
  title     = {{Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos}},
  author    = {Liu, Jiawei and Zha, Zheng-Jun and Wu, Wei and Zheng, Kecheng and Sun, Qibin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {4370-4379},
  doi       = {10.1109/CVPR46437.2021.00435},
  url       = {https://mlanthology.org/cvpr/2021/liu2021cvpr-spatialtemporal/}
}