Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
Abstract
In this paper, we construct a large-scale benchmark dataset for Ground-to-Aerial Video-based person Re-Identification, named G2A-VReID, which comprises 185,907 images and 5,576 tracklets, featuring 2,788 distinct identities. To our knowledge, this is the first dataset for video ReID under Ground-to-Aerial scenarios. G2A-VReID dataset has the following characteristics: 1) Drastic view changes; 2) Large number of annotated identities; 3) Rich outdoor scenarios; 4) Huge difference in resolution. Additionally, we propose a new benchmark approach for cross-platform ReID by transforming the cross-platform visual alignment problem into visual-semantic alignment through vision-language model (i.e., CLIP) and applying a parameter-efficient Video Set-Level-Adapter module to adapt image-based foundation model to video ReID tasks, termed VSLA-CLIP. Besides, to further reduce the great discrepancy across the platforms, we also devise the platform-bridge prompts for efficient visual feature alignment. Extensive experiments demonstrate the superiority of the proposed method on all existing video ReID datasets and our proposed G2A-VReID dataset. The code and datasets are available at https://github.com/FHR-L/VSLA-CLIP.
Cite
Text
Zhang et al. "Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73383-3_16Markdown
[Zhang et al. "Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhang2024eccv-crossplatform/) doi:10.1007/978-3-031-73383-3_16BibTeX
@inproceedings{zhang2024eccv-crossplatform,
title = {{Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach}},
author = {Zhang, Shizhou and Luo, Wenlong and Cheng, De and Yang, Qingchun and Ran, Lingyan and Xing, Yinghui and Zhang, Yanning},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73383-3_16},
url = {https://mlanthology.org/eccv/2024/zhang2024eccv-crossplatform/}
}