Physical Querying with Multi-Modal Sensing
Abstract
We present Marvin, a system that can search physical objects using a mobile or wearable device. It integrates HOG-based object recognition, SURF-based localization information, automatic speech recognition, and user feed-back information with a probabilistic model to recognize the “object of interest ” at high accuracy and at interactive speeds. Once the object of interest is recognized, the in-formation that the user is querying, e.g. reviews, options, etc., is displayed on the user’s mobile or wearable device. We tested this prototype in a real-world retail store during business hours, with varied degree of background noise and clutter. We show that this multi-modal approach achieves superior recognition accuracy compared to using a vision system alone, especially in cluttered scenes where a vision system would be unable to distinguish which object is of interest to the user without additional input. It is computa-tionally able to scale to large numbers of objects by focus-ing compute-intensive resources on the objects most likely to be of interest, inferred from user speech and implicit localization information. We present the system architec-ture, the probabilistic model that integrates the multi-modal information, and empirical results showing the benefits of multi-modal integration. 1.
Cite
Text
Baek et al. "Physical Querying with Multi-Modal Sensing." IEEE/CVF Winter Conference on Applications of Computer Vision, 2014. doi:10.1109/WACV.2014.6836103Markdown
[Baek et al. "Physical Querying with Multi-Modal Sensing." IEEE/CVF Winter Conference on Applications of Computer Vision, 2014.](https://mlanthology.org/wacv/2014/baek2014wacv-physical/) doi:10.1109/WACV.2014.6836103BibTeX
@inproceedings{baek2014wacv-physical,
title = {{Physical Querying with Multi-Modal Sensing}},
author = {Baek, Iljoo and Stine, Taylor and Dash, Denver and Xiao, Fanyi and Sheikh, Yaser and Movshovitz-Attias, Yair and Chen, Mei and Hebert, Martial and Kanade, Takeo},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2014},
pages = {183-190},
doi = {10.1109/WACV.2014.6836103},
url = {https://mlanthology.org/wacv/2014/baek2014wacv-physical/}
}