View-Decoupled Transformer for Person Re-Identification Under Aerial-Ground Camera Network

Abstract

Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras such as ground-ground matching. However as a more practical scenario aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dramatic view discrepancy as the most significant challenge in AGPReID the view-decoupled transformer (VDT) is proposed as a simple yet effective framework. Two major components are designed in VDT to decouple view-related and view-unrelated features namely hierarchical subtractive separation and orthogonal loss where the former separates these two features inside the VDT and the latter constrains these two to be independent. In addition we contribute a large-scale AGPReID dataset called CARGO consisting of five/eight aerial/ground cameras 5000 identities and 108563 images. Experiments on two datasets show that VDT is a feasible and effective solution for AGPReID surpassing the previous method on mAP/Rank1 by up to 5.0%/2.7% on CARGO and 3.7%/5.2% on AG-ReID keeping the same magnitude of computational complexity. Our project is available at https://github.com/LinlyAC/VDT-AGPReID

Cite

Text

Zhang et al. "View-Decoupled Transformer for Person Re-Identification Under Aerial-Ground Camera Network." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02077

Markdown

[Zhang et al. "View-Decoupled Transformer for Person Re-Identification Under Aerial-Ground Camera Network." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhang2024cvpr-viewdecoupled/) doi:10.1109/CVPR52733.2024.02077

BibTeX

@inproceedings{zhang2024cvpr-viewdecoupled,
  title     = {{View-Decoupled Transformer for Person Re-Identification Under Aerial-Ground Camera Network}},
  author    = {Zhang, Quan and Wang, Lei and Patel, Vishal M. and Xie, Xiaohua and Lai, Jianhaung},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {22000-22009},
  doi       = {10.1109/CVPR52733.2024.02077},
  url       = {https://mlanthology.org/cvpr/2024/zhang2024cvpr-viewdecoupled/}
}