A Multi-Scale CNN for Affordance Segmentation in RGB Images
Abstract
Given a single RGB image our goal is to label every pixel with an affordance type. By affordance, we mean an object’s capability to readily support a certain human action, without requiring precursor actions. We focus on segmenting the following five affordance types in indoor scenes: ‘walkable’, ‘sittable’, ‘lyable’, ‘reachable’, and ‘movable’. Our approach uses a deep architecture, consisting of a number of multi-scale convolutional neural networks, for extracting mid-level visual cues and combining them toward affordance segmentation. The mid-level cues include depth map, surface normals, and segmentation of four types of surfaces – namely, floor, structure, furniture and props. For evaluation, we augmented the NYUv2 dataset with new ground-truth annotations of the five affordance types. We are not aware of prior work which starts from pixels, infers mid-level cues, and combines them in a feed-forward fashion for predicting dense affordance maps of a single RGB image.
Cite
Text
Roy and Todorovic. "A Multi-Scale CNN for Affordance Segmentation in RGB Images." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46493-0_12Markdown
[Roy and Todorovic. "A Multi-Scale CNN for Affordance Segmentation in RGB Images." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/roy2016eccv-multi/) doi:10.1007/978-3-319-46493-0_12BibTeX
@inproceedings{roy2016eccv-multi,
title = {{A Multi-Scale CNN for Affordance Segmentation in RGB Images}},
author = {Roy, Anirban and Todorovic, Sinisa},
booktitle = {European Conference on Computer Vision},
year = {2016},
pages = {186-201},
doi = {10.1007/978-3-319-46493-0_12},
url = {https://mlanthology.org/eccv/2016/roy2016eccv-multi/}
}