Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos

Camille Couprie, Clément Farabet, Laurent Najman, Yann LeCun

JMLR 2014 pp. 3489-3511

/jmlr/2014/couprie2014jmlr-convolutional/

Abstract

This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. Using a frame by frame labeling, we obtain nearly state-of-the-art performance on the NYU-v2 depth data set with an accuracy of 64.5%. We then show that the labeling can be further improved by exploiting the temporal consistency in the video sequence of the scene. To that goal, we present a method producing temporally consistent superpixels from a streaming video. Among the different methods producing superpixel segmentations of an image, the graph-based approach of Felzenszwalb and Huttenlocher is broadly employed. One of its interesting properties is that the regions are computed in a greedy manner in quasi-linear time by using a minimum spanning tree. In a framework exploiting minimum spanning trees all along, we propose an efficient video segmentation approach that computes temporally consistent pixels in a causal manner, filling the need for causal and real-time applications. We illustrate the labeling of indoor scenes in video sequences that could be processed in real-time using appropriate hardware such as an FPGA.

PDF JMLR Semantic Scholar

Cite

Text

Couprie et al. "Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos." Journal of Machine Learning Research, 2014.

Markdown

[Couprie et al. "Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos." Journal of Machine Learning Research, 2014.](https://mlanthology.org/jmlr/2014/couprie2014jmlr-convolutional/)

BibTeX

@article{couprie2014jmlr-convolutional,
  title     = {{Convolutional Nets and Watershed Cuts for Real-Time Semantic Labeling of RGBD Videos}},
  author    = {Couprie, Camille and Farabet, Clément and Najman, Laurent and LeCun, Yann},
  journal   = {Journal of Machine Learning Research},
  year      = {2014},
  pages     = {3489-3511},
  volume    = {15},
  url       = {https://mlanthology.org/jmlr/2014/couprie2014jmlr-convolutional/}
}