Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline

Abstract

We endeavor on a rarely explored task named Insubstan-tial Object Detection (IOD), which aims to localize the object with following characteristics: (1) amorphous shape with indistinct boundary; (2) similarity to surroundings; (3) absence in color. Accordingly, it is far more challenging to distinguish insubstantial objects in a single static frame and the collaborative representation of spatial and tempo-ral information is crucial. Thus, we construct an IOD-Video dataset comprised of 600 videos (141,017 frames) covering various distances, sizes, visibility, and scenes captured by different spectral ranges. In addition, we develop a spatio-temporal aggregation framework for IOD, in which differ-ent backbones are deployed and a spatio-temporal aggregation loss (STAloss) is elaborately designed to leverage the consistency along the time axis. Experiments conducted on IOD-Video dataset demonstrate that spatio-temporal aggregation can significantly improve the performance of IOD. We hope our work will attract further researches into this valuable yet challenging task. The code will be available at: https://github.com/CalayZhou/IOD-Video.

Cite

Text

Zhou et al. "Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00311

Markdown

[Zhou et al. "Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zhou2022cvpr-explore/) doi:10.1109/CVPR52688.2022.00311

BibTeX

@inproceedings{zhou2022cvpr-explore,
  title     = {{Explore Spatio-Temporal Aggregation for Insubstantial Object Detection: Benchmark Dataset and Baseline}},
  author    = {Zhou, Kailai and Wang, Yibo and Lv, Tao and Li, Yunqian and Chen, Linsen and Shen, Qiu and Cao, Xun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {3104-3115},
  doi       = {10.1109/CVPR52688.2022.00311},
  url       = {https://mlanthology.org/cvpr/2022/zhou2022cvpr-explore/}
}