Physically Grounded Spatio-Temporal Object Affordances

Abstract

Objects in human environments support various functionalities which govern how people interact with their environments in order to perform tasks. In this work, we discuss how to represent and learn a functional understanding of an environment in terms of object affordances. Such an understanding is useful for many applications such as activity detection and assistive robotics. Starting with a semantic notion of affordances, we present a generative model that takes a given environment and human intention into account, and grounds the affordances in the form of spatial locations on the object and temporal trajectories in the 3D environment. The probabilistic model also allows uncertainties and variations in the grounded affordances. We apply our approach on RGB-D videos from Cornell Activity Dataset, where we first show that we can successfully ground the affordances, and we then show that learning such affordances improves performance in the labeling tasks.

Cite

Text

Koppula and Saxena. "Physically Grounded Spatio-Temporal Object Affordances." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10578-9_54

Markdown

[Koppula and Saxena. "Physically Grounded Spatio-Temporal Object Affordances." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/koppula2014eccv-physically/) doi:10.1007/978-3-319-10578-9_54

BibTeX

@inproceedings{koppula2014eccv-physically,
  title     = {{Physically Grounded Spatio-Temporal Object Affordances}},
  author    = {Koppula, Hema Swetha and Saxena, Ashutosh},
  booktitle = {European Conference on Computer Vision},
  year      = {2014},
  pages     = {831-847},
  doi       = {10.1007/978-3-319-10578-9_54},
  url       = {https://mlanthology.org/eccv/2014/koppula2014eccv-physically/}
}