Triply Supervised Decoder Networks for Joint Detection and Segmentation
Abstract
Joint object detection and semantic segmentation is essential in many fields such as self-driving cars. An initial attempt towards this goal is to simply share a single network for multi-task learning. We argue that it does not make full use of the fact that detection and segmentation are mutually beneficial. In this paper, we propose a framework called TripleNet to deeply boost these two tasks. On the one hand, to deeply join the two tasks at different scales, triple supervisions including detection-oriented supervision and class-aware/agnostic segmentation supervisions are imposed on each layer of the decoder. Class-agnostic segmentation provides an objectness prior to detection and segmentation. On the other hand, to further intercross the two tasks and refine the features in each scale, two light-weight modules (i.e., the inner-connected module and the attention skip-layer fusion) are incorporated. Because segmentation supervision on each decoder layer are not performed at the test stage and two added modules are light-weight, the proposed TripleNet can run at a real-time speed (16 fps). Experiments on the VOC 2007/2012 and COCO datasets show that TripleNet outperforms all the other one-stage methods on both two tasks (e.g., 81.9% mAP and 83.3% mIoU on VOC 2012, and 37.1% mAP and 59.6% mIoU on COCO) by a single network.
Cite
Text
Cao et al. "Triply Supervised Decoder Networks for Joint Detection and Segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.00757Markdown
[Cao et al. "Triply Supervised Decoder Networks for Joint Detection and Segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/cao2019cvpr-triply/) doi:10.1109/CVPR.2019.00757BibTeX
@inproceedings{cao2019cvpr-triply,
title = {{Triply Supervised Decoder Networks for Joint Detection and Segmentation}},
author = {Cao, Jiale and Pang, Yanwei and Li, Xuelong},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2019},
doi = {10.1109/CVPR.2019.00757},
url = {https://mlanthology.org/cvpr/2019/cao2019cvpr-triply/}
}