Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition

Abstract

This paper studies the problem of RGB-D object recognition. Inspired by the great success of deep convolutional neural networks (DCNN) in AI, researchers have tried to apply it to improve the performance of RGB-D object recognition. However, DCNN always requires a large-scale annotated dataset to supervise its training. Manually labeling such a large RGB-D dataset is expensive and time consuming, which prevents DCNN from quickly promoting this research area. To address this problem, we propose a semi-supervised multimodal deep learning framework to train DCNN effectively based on very limited labeled data and massive unlabeled data. The core of our framework is a novel diversity preserving co-training algorithm, which can successfully guide DCNN to learn from the unlabeled RGB-D data by making full use of the complementary cues of the RGB and depth data in object representation. Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods. PDF

Cite

Text

Cheng et al. "Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition." International Joint Conference on Artificial Intelligence, 2016.

Markdown

[Cheng et al. "Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition." International Joint Conference on Artificial Intelligence, 2016.](https://mlanthology.org/ijcai/2016/cheng2016ijcai-semi/)

BibTeX

@inproceedings{cheng2016ijcai-semi,
  title     = {{Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition}},
  author    = {Cheng, Yanhua and Zhao, Xin and Cai, Rui and Li, Zhiwei and Huang, Kaiqi and Rui, Yong},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {3345-3351},
  url       = {https://mlanthology.org/ijcai/2016/cheng2016ijcai-semi/}
}