Incomplete Multi-Modal Visual Data Grouping

Abstract

Nowadays multi-modal visual data are much easier to access as the technology develops. Nevertheless, there is an underlying problem hidden behind the emerging multi-modality techniques: What if one/more modal data fail? Motivated by this question, we propose an unsupervised method which well handles the incomplete multi-modal data by transforming the original and incomplete data to a new and complete representation in a latent space. Different from the existing efforts that simply project data from each modality into a common subspace, a novel graph Laplacian term with a good probabilistic interpretation is proposed to couple the incomplete multi-modal samples. In such a way, a compact global structure over the entire heterogeneous data is well preserved, leading to a strong grouping discriminability. As a non-trivial contribution, we provide the optimization solution to the proposed model. In experiments, we extensively test our method and competitors on one synthetic data, two RGB-D video datasets and two image datasets. The superior results validate the benefits of the proposed method, especially when multi-modal data suffer from large incompleteness. PDF

Cite

Text

Zhao et al. "Incomplete Multi-Modal Visual Data Grouping." International Joint Conference on Artificial Intelligence, 2016.

Markdown

[Zhao et al. "Incomplete Multi-Modal Visual Data Grouping." International Joint Conference on Artificial Intelligence, 2016.](https://mlanthology.org/ijcai/2016/zhao2016ijcai-incomplete/)

BibTeX

@inproceedings{zhao2016ijcai-incomplete,
  title     = {{Incomplete Multi-Modal Visual Data Grouping}},
  author    = {Zhao, Handong and Liu, Hongfu and Fu, Yun},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {2392-2398},
  url       = {https://mlanthology.org/ijcai/2016/zhao2016ijcai-incomplete/}
}