Multimodal Poisson Gamma Belief Network

Abstract

To learn a deep generative model of multimodal data, we propose a multimodal Poisson gamma belief network (mPGBN) that tightly couple the data of different modalities at multiple hidden layers. The mPGBN unsupervisedly extracts a nonnegative latent representation using an upward-downward Gibbs sampler. It imposes sparse connections between different layers, making it simple to visualize the generative process and the relationships between the latent features of different modalities. Our experimental results on bi-modal data consisting of images and tags show that the mPGBN can easily impute a missing modality and hence is useful for both image annotation and retrieval. We further demonstrate that the mPGBN achieves state-of-the-art results on unsupervisedly extracting latent features from multimodal data.

Cite

Text

Wang et al. "Multimodal Poisson Gamma Belief Network." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.11846

Markdown

[Wang et al. "Multimodal Poisson Gamma Belief Network." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/wang2018aaai-multimodal/) doi:10.1609/AAAI.V32I1.11846

BibTeX

@inproceedings{wang2018aaai-multimodal,
  title     = {{Multimodal Poisson Gamma Belief Network}},
  author    = {Wang, Chaojie and Chen, Bo and Zhou, Mingyuan},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {2492-2499},
  doi       = {10.1609/AAAI.V32I1.11846},
  url       = {https://mlanthology.org/aaai/2018/wang2018aaai-multimodal/}
}