Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions

Abstract

A major goal of grounded language learning research is to enable robots to connect language predicates to a robot's physical interactive perception of the world. Coupling object exploratory behaviors such as grasping, lifting, and looking with multiple sensory modalities (e.g., audio, haptics, and vision) enables a robot to ground non-visual words like ``heavy'' as well as visual words like ``red''. A major limitation of existing approaches to multi-modal language grounding is that a robot has to exhaustively explore training objects with a variety of actions when learning a new such language predicate. This paper proposes a method for guiding a robot's behavioral exploration policy when learning a novel predicate based on known grounded predicates and the novel predicate's linguistic relationship to them. We demonstrate our approach on two datasets in which a robot explored large sets of objects and was tasked with learning to recognize whether novel words applied to those objects.

Cite

Text

Thomason et al. "Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.11966

Markdown

[Thomason et al. "Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/thomason2018aaai-guiding/) doi:10.1609/AAAI.V32I1.11966

BibTeX

@inproceedings{thomason2018aaai-guiding,
  title     = {{Guiding Exploratory Behaviors for Multi-Modal Grounding of Linguistic Descriptions}},
  author    = {Thomason, Jesse and Sinapov, Jivko and Mooney, Raymond J. and Stone, Peter},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {5520-5527},
  doi       = {10.1609/AAAI.V32I1.11966},
  url       = {https://mlanthology.org/aaai/2018/thomason2018aaai-guiding/}
}