CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception

Abstract

Cooperative perception aims to address the inherent limitations of single-vehicle autonomous driving systems through information exchange among multiple agents. Previous research has primarily focused on single-frame perception tasks. However, the more challenging cooperative sequential perception tasks, such as cooperative 3D multi-object tracking, have not been thoroughly investigated. Therefore, we propose CoopTrack, a fully instance-level end-to-end framework for cooperative tracking, featuring learnable instance association, which fundamentally differs from existing approaches. CoopTrack transmits sparse instance-level features that significantly enhance perception capabilities while maintaining low transmission costs. Furthermore, the framework comprises two key components: Multi-Dimensional Feature Extraction, and Cross-Agent Association and Aggregation, which collectively enable comprehensive instance representation with semantic and motion features, and adaptive cross-agent association and fusion based on a feature graph. Experiments on both the V2X-Seq and Griffin datasets demonstrate that CoopTrack achieves excellent performance. Specifically, it attains state-of-the-art results on V2X-Seq, with 39.0% mAP and 32.8% AMOTA. The project is available at https://github.com/zhongjiaru/CoopTrack.

Cite

Text

Zhong et al. "CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception." International Conference on Computer Vision, 2025.

Markdown

[Zhong et al. "CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhong2025iccv-cooptrack/)

BibTeX

@inproceedings{zhong2025iccv-cooptrack,
  title     = {{CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception}},
  author    = {Zhong, Jiaru and Wang, Jiahao and Xu, Jiahui and Li, Xiaofan and Nie, Zaiqing and Yu, Haibao},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {26954-26965},
  url       = {https://mlanthology.org/iccv/2025/zhong2025iccv-cooptrack/}
}