Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation

Abstract

We present topic-regression multi-modal Latent Dirich-let Allocation (tr-mmLDA), a novel statistical topic model for the task of image and video annotation. At the heart of our new annotation model lies a novel latent variable regression approach to capture correlations between image or video features and annotation texts. Instead of sharing a set of latent topics between the 2 data modalities as in the formulation of correspondence LDA in [2], our approach introduces a regression module to correlate the 2 sets of topics, which captures more general forms of association and allows the number of topics in the 2 data modalities to be different. We demonstrate the power of tr-mmLDA on 2 standard annotation datasets: a 5000-image subset of COREL and a 2687-image LabelMe dataset. The proposed association model shows improved performance over correspondence LDA as measured by caption perplexity.

Cite

Text

Putthividhya et al. "Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2010. doi:10.1109/CVPR.2010.5540000

Markdown

[Putthividhya et al. "Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2010.](https://mlanthology.org/cvpr/2010/putthividhya2010cvpr-topic/) doi:10.1109/CVPR.2010.5540000

BibTeX

@inproceedings{putthividhya2010cvpr-topic,
  title     = {{Topic Regression Multi-Modal Latent Dirichlet Allocation for Image Annotation}},
  author    = {Putthividhya, Duangmanee and Attias, Hagai Thomas and Nagarajan, Srikantan S.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2010},
  pages     = {3408-3415},
  doi       = {10.1109/CVPR.2010.5540000},
  url       = {https://mlanthology.org/cvpr/2010/putthividhya2010cvpr-topic/}
}