Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking

Abstract

Drone crowd tracking has various applications such as crowd management and video surveillance. Unlike in general multi-object tracking, the size of the objects to be tracked are small, and the ground truth is given by a point-level annotation, which has no region information. This causes the lack of discriminative features for finding the same objects from many similar objects. Thus, similarity-based trackingtechniques, which are widely used for multi-object tracking with bounding-box, are difficult to use. To deal with this problem, we take into account the temporal context of the local area. To aggregate temporal context in a local area, we propose a multi-frame attention with feature-level warping. The feature-level warping can align the features of the same object in multiple frame, and then multi-frame attention can effectively aggregate the temporal context from the warped features. The experimental results show the effectiveness of our method. Our method outperformed the state-of-the-art method in DroneCrowd dataset.

Cite

Text

Asanomi et al. "Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking." Winter Conference on Applications of Computer Vision, 2023.

Markdown

[Asanomi et al. "Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/asanomi2023wacv-multiframe/)

BibTeX

@inproceedings{asanomi2023wacv-multiframe,
  title     = {{Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking}},
  author    = {Asanomi, Takanori and Nishimura, Kazuya and Bise, Ryoma},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2023},
  pages     = {1664-1673},
  url       = {https://mlanthology.org/wacv/2023/asanomi2023wacv-multiframe/}
}