Limited Sampling Reference Frame for MaskTrack R-CNN

Abstract

With the great achievement for the computer vision tasks, e.g., image classification, object detection and segmentation, people are diving into more complex vision tasks. Video instance segmentation is a new task which includes detection, segmentation and tracking of instances simultaneously in a video. Occluded Video Instance Segmentation (OVIS) is used for this task, and it includes many heavily occluded scenes. Besides, there is a long range for the length of videos in this dataset. In order to track instances in videos with different lengths, we make some improvements based on MaskTrack R-CNN. Based on these optimizations, a refinement model can be well used to detect and segment instances, which acquires a better track accuracy in long videos. Furthermore, we apply Stochastic Weights Aver-aging training strategy to get a better result. Finally, The proposed method can achieve the mAP score of 28.9 for the validation set and 32.2 for the test set on the OVIS dataset.

Cite

Text

Li et al. "Limited Sampling Reference Frame for MaskTrack R-CNN." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00430

Markdown

[Li et al. "Limited Sampling Reference Frame for MaskTrack R-CNN." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/li2021iccvw-limited/) doi:10.1109/ICCVW54120.2021.00430

BibTeX

@inproceedings{li2021iccvw-limited,
  title     = {{Limited Sampling Reference Frame for MaskTrack R-CNN}},
  author    = {Li, Zhuang and Cao, Leilei and Wang, Hongbin},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2021},
  pages     = {3847-3850},
  doi       = {10.1109/ICCVW54120.2021.00430},
  url       = {https://mlanthology.org/iccvw/2021/li2021iccvw-limited/}
}