Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation

Lee, Jungbeom; Kim, Eunji; Lee, Sungmin; Lee, Jangho; Yoon, Sungroh

doi:10.1109/ICCV.2019.00691

Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation

Jungbeom Lee, Eunji Kim, Sungmin Lee, Jangho Lee, Sungroh Yoon

ICCV 2019

doi:10.1109/ICCV.2019.00691 /iccv/2019/lee2019iccv-frametoframe/

Abstract

When a deep neural network is trained on data with only image-level labeling, the regions activated in each image tend to identify only a small region of the target object. We propose a method of using videos automatically harvested from the web to identify a larger region of the target object by using temporal information, which is not present in the static image. The temporal variations in a video allow different regions of the target object to be activated. We obtain an activated region in each frame of a video, and then aggregate the regions from successive frames into a single image, using a warping technique based on optical flow. The resulting localization maps cover more of the target object, and can then be used as proxy ground-truth to train a segmentation network. This simple approach outperforms existing methods under the same level of supervision, and even approaches relying on extra annotations. Based on VGG-16 and ResNet 101 backbones, our method achieves the mIoU of 65.0 and 67.4, respectively, on PASCAL VOC 2012 test images, which represents a new state-of-the-art.

PDF ICCV Semantic Scholar

Cite

Text

Lee et al. "Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00691

Markdown

[Lee et al. "Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/lee2019iccv-frametoframe/) doi:10.1109/ICCV.2019.00691

BibTeX

@inproceedings{lee2019iccv-frametoframe,
  title     = {{Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation}},
  author    = {Lee, Jungbeom and Kim, Eunji and Lee, Sungmin and Lee, Jangho and Yoon, Sungroh},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00691},
  url       = {https://mlanthology.org/iccv/2019/lee2019iccv-frametoframe/}
}