On the Integration of Grounding Language and Learning Objects
Abstract
This paper presents a multimodal learning system that can ground spoken names of objects in their physical referents and learn to recognize those objects simultaneously from nat-urally co-occurring multisensory input. There are two tech-nical problems involved: (1) the correspondence problem in symbol grounding – how to associate words (symbols) with their perceptually grounded meanings from multiple co-occurrences between words and objects in the physical en-vironment. (2) object learning – how to recognize and cate-gorize visual objects. We argue that those two problems can be fundamentally simplified by considering them in a gen-eral system and incorporating the spatio-temporal and cross-modal constraints of multimodal data. The system collects egocentric data including image sequences as well as speech while users perform natural tasks. It is able to automatically infer the meanings of object names from vision, and cate-gorize objects based on teaching signals potentially encoded in speech. The experimental results reported in this paper reveal the effectiveness of using multimodal data and inte-grating heterogeneous techniques in machine learning, natu-ral language processing and computer vision.
Cite
Text
Yu and Ballard. "On the Integration of Grounding Language and Learning Objects." AAAI Conference on Artificial Intelligence, 2004.Markdown
[Yu and Ballard. "On the Integration of Grounding Language and Learning Objects." AAAI Conference on Artificial Intelligence, 2004.](https://mlanthology.org/aaai/2004/yu2004aaai-integration/)BibTeX
@inproceedings{yu2004aaai-integration,
title = {{On the Integration of Grounding Language and Learning Objects}},
author = {Yu, Chen and Ballard, Dana H.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2004},
pages = {488-494},
url = {https://mlanthology.org/aaai/2004/yu2004aaai-integration/}
}