Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition
Abstract
This paper studies the problem of RGB-D object recognition. Inspired by the great success of deep convolutional neural networks (DCNN) in AI, researchers have tried to apply it to improve the performance of RGB-D object recognition. However, DCNN always requires a large-scale annotated dataset to supervise its training. Manually labeling such a large RGB-D dataset is expensive and time consuming, which prevents DCNN from quickly promoting this research area. To address this problem, we propose a semi-supervised multimodal deep learning framework to train DCNN effectively based on very limited labeled data and massive unlabeled data. The core of our framework is a novel diversity preserving co-training algorithm, which can successfully guide DCNN to learn from the unlabeled RGB-D data by making full use of the complementary cues of the RGB and depth data in object representation. Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods. PDF
Cite
Text
Cheng et al. "Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition." International Joint Conference on Artificial Intelligence, 2016.Markdown
[Cheng et al. "Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition." International Joint Conference on Artificial Intelligence, 2016.](https://mlanthology.org/ijcai/2016/cheng2016ijcai-semi/)BibTeX
@inproceedings{cheng2016ijcai-semi,
title = {{Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition}},
author = {Cheng, Yanhua and Zhao, Xin and Cai, Rui and Li, Zhiwei and Huang, Kaiqi and Rui, Yong},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2016},
pages = {3345-3351},
url = {https://mlanthology.org/ijcai/2016/cheng2016ijcai-semi/}
}