Detect or Track: Towards Cost-Effective Video Object Detection/Tracking

Abstract

State-of-the-art object detectors and trackers are developing fast. Trackers are in general more efficient than detectors but bear the risk of drifting. A question is hence raised – how to improve the accuracy of video object detection/tracking by utilizing the existing detectors and trackers within a given time budget? A baseline is frame skipping – detecting every N-th frames and tracking for the frames in between. This baseline, however, is suboptimal since the detection frequency should depend on the tracking quality. To this end, we propose a scheduler network, which determines to detect or track at a certain frame, as a generalization of Siamese trackers. Although being light-weight and simple in structure, the scheduler network is more effective than the frame skipping baselines and flow-based approaches, as validated on ImageNet VID dataset in video object detection/tracking.

Cite

Text

Luo et al. "Detect or Track: Towards Cost-Effective Video Object Detection/Tracking." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33018803

Markdown

[Luo et al. "Detect or Track: Towards Cost-Effective Video Object Detection/Tracking." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/luo2019aaai-detect/) doi:10.1609/AAAI.V33I01.33018803

BibTeX

@inproceedings{luo2019aaai-detect,
  title     = {{Detect or Track: Towards Cost-Effective Video Object Detection/Tracking}},
  author    = {Luo, Hao and Xie, Wenxuan and Wang, Xinggang and Zeng, Wenjun},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {8803-8810},
  doi       = {10.1609/AAAI.V33I01.33018803},
  url       = {https://mlanthology.org/aaai/2019/luo2019aaai-detect/}
}