BoxMask: Revisiting Bounding Box Supervision for Video Object Detection
Abstract
We present a new, simple yet effective approach to uplift video object detection. We observe that prior works operate on instance-level feature aggregation that imminently neglects the refined pixel-level representation, resulting in confusion among objects sharing similar appearance or motion characteristics. To address this limitation, we pro- pose BoxMask, which effectively learns discriminative representations by incorporating class-aware pixel-level information. We simply consider bounding box-level annotations as a coarse mask for each object to supervise our method. The proposed module can be effortlessly integrated into any region-based detector to boost detection. Extensive experiments on ImageNet VID and EPIC KITCHENS datasets demonstrate consistent and significant improvement when we plug our BoxMask module into numerous recent state-of-the-art methods. The code will be available at https://github.com/khurramHashmi/BoxMask.
Cite
Text
Hashmi et al. "BoxMask: Revisiting Bounding Box Supervision for Video Object Detection." Winter Conference on Applications of Computer Vision, 2023.Markdown
[Hashmi et al. "BoxMask: Revisiting Bounding Box Supervision for Video Object Detection." Winter Conference on Applications of Computer Vision, 2023.](https://mlanthology.org/wacv/2023/hashmi2023wacv-boxmask/)BibTeX
@inproceedings{hashmi2023wacv-boxmask,
title = {{BoxMask: Revisiting Bounding Box Supervision for Video Object Detection}},
author = {Hashmi, Khurram Azeem and Pagani, Alain and Stricker, Didier and Afzal, Muhammad Zeshan},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2023},
pages = {2030-2040},
url = {https://mlanthology.org/wacv/2023/hashmi2023wacv-boxmask/}
}