Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification

Abstract

Video-based person re-identification (reID) aims at matching the same person across video clips. It is a challenging task due to the existence of redundancy among frames, newly revealed appearance, occlusion, and motion blurs. In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-aided Attentive Feature Aggregation (MG-RAFA), to delicately aggregate spatio-temporal features into a discriminative video-level feature representation. In order to determine the contribution/importance of a spatial-temporal feature node, we propose to learn the attention from a global view with convolutional operations. Specifically, we stack its relations, i.e.no, pairwise correlations with respect to a representative set of reference feature nodes (S-RFNs) that represents global video information, together with the feature itself to infer the attention. Moreover, to exploit the semantics of different levels, we propose to learn multi-granularity attentions based on the relations captured at different granularities. Extensive ablation studies demonstrate the effectiveness of our attentive feature aggregation module MG-RAFA. Our framework achieves the state-of-the-art performance on three benchmark datasets.

Cite

Text

Zhang et al. "Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.01042

Markdown

[Zhang et al. "Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/zhang2020cvpr-multigranularity/) doi:10.1109/CVPR42600.2020.01042

BibTeX

@inproceedings{zhang2020cvpr-multigranularity,
  title     = {{Multi-Granularity Reference-Aided Attentive Feature Aggregation for Video-Based Person Re-Identification}},
  author    = {Zhang, Zhizheng and Lan, Cuiling and Zeng, Wenjun and Chen, Zhibo},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.01042},
  url       = {https://mlanthology.org/cvpr/2020/zhang2020cvpr-multigranularity/}
}