Physically Grounded Spatio-Temporal Object Affordances
Abstract
Objects in human environments support various functionalities which govern how people interact with their environments in order to perform tasks. In this work, we discuss how to represent and learn a functional understanding of an environment in terms of object affordances. Such an understanding is useful for many applications such as activity detection and assistive robotics. Starting with a semantic notion of affordances, we present a generative model that takes a given environment and human intention into account, and grounds the affordances in the form of spatial locations on the object and temporal trajectories in the 3D environment. The probabilistic model also allows uncertainties and variations in the grounded affordances. We apply our approach on RGB-D videos from Cornell Activity Dataset, where we first show that we can successfully ground the affordances, and we then show that learning such affordances improves performance in the labeling tasks.
Cite
Text
Koppula and Saxena. "Physically Grounded Spatio-Temporal Object Affordances." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10578-9_54Markdown
[Koppula and Saxena. "Physically Grounded Spatio-Temporal Object Affordances." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/koppula2014eccv-physically/) doi:10.1007/978-3-319-10578-9_54BibTeX
@inproceedings{koppula2014eccv-physically,
title = {{Physically Grounded Spatio-Temporal Object Affordances}},
author = {Koppula, Hema Swetha and Saxena, Ashutosh},
booktitle = {European Conference on Computer Vision},
year = {2014},
pages = {831-847},
doi = {10.1007/978-3-319-10578-9_54},
url = {https://mlanthology.org/eccv/2014/koppula2014eccv-physically/}
}