Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark

Abstract

To promote the developments of object detection, tracking and counting algorithms in drone-captured videos, we construct a benchmark with a new drone-captured large-scale dataset, named as DroneCrowd, formed by 112 video clips with 33,600 HD frames in various scenarios. Notably, we annotate 20,800 people trajectories with 4.8 million heads and several video-level attributes. Meanwhile, we design the Space-Time Neighbor-Aware Network (STNNet) as a strong baseline to solve object detection, tracking and counting jointly in dense crowds. STNNet is formed by the feature extraction module, followed by the density map estimation heads, and localization and association subnets. To exploit the context information of neighboring objects, we design the neighboring context loss to guide the association subnet training, which enforces consistent relative position of nearby objects in temporal domain. Extensive experiments on our DroneCrowd dataset demonstrate that STNNet performs favorably against the state-of-the-arts.

Cite

Text

Wen et al. "Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00772

Markdown

[Wen et al. "Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/wen2021cvpr-detection/) doi:10.1109/CVPR46437.2021.00772

BibTeX

@inproceedings{wen2021cvpr-detection,
  title     = {{Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark}},
  author    = {Wen, Longyin and Du, Dawei and Zhu, Pengfei and Hu, Qinghua and Wang, Qilong and Bo, Liefeng and Lyu, Siwei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {7812-7821},
  doi       = {10.1109/CVPR46437.2021.00772},
  url       = {https://mlanthology.org/cvpr/2021/wen2021cvpr-detection/}
}