End-to-End Instance Segmentation with Recurrent Attention

Abstract

While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. Techniques that combine large graphical models with low-level vision have been proposed to address this problem; however, we propose an end-to-end recurrent neural network (RNN) architecture with an attention mechanism to model a human-like counting process, and produce detailed instance segmentations. The network is jointly trained to sequentially produce regions of interest as well as a dominant object segmentation within each region. The proposed model achieves competitive results on the CVPPP, KITTI, and Cityscapes datasets.

Cite

Text

Ren and Zemel. "End-to-End Instance Segmentation with Recurrent Attention." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.39

Markdown

[Ren and Zemel. "End-to-End Instance Segmentation with Recurrent Attention." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/ren2017cvpr-endtoend/) doi:10.1109/CVPR.2017.39

BibTeX

@inproceedings{ren2017cvpr-endtoend,
  title     = {{End-to-End Instance Segmentation with Recurrent Attention}},
  author    = {Ren, Mengye and Zemel, Richard S.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.39},
  url       = {https://mlanthology.org/cvpr/2017/ren2017cvpr-endtoend/}
}