Learning Semantic Visual Vocabularies Using Diffusion Distance

Abstract

In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn a semantic visual vocabulary from abundant quantized midlevel features. Each midlevel feature is represented by the vector of pointwise mutual information (PMI). In this midlevel feature space, we believe the features produced by similar sources must lie on a certain manifold. To capture the intrinsic geometric relations between features, we measure their dissimilarity using diffusion distance. The underlying idea is to embed the midlevel features into a semantic lower-dimensional space. Our goal is to construct a compact yet discriminative semantic visual vocabulary. Although the conventional approach using k-means is good for vocabulary construction, its performance is sensitive to the size of the visual vocabulary. In addition, the learnt visual words are not semantically meaningful since the clustering criterion is based on appearance similarity only. Our proposed approach can effectively overcome these problems by capturing the semantic and geometric relations of the feature space using diffusion maps. Unlike some of the supervised vocabulary construction approaches, and the unsupervised methods such as pLSA and LDA, diffusion maps can capture the local intrinsic geometric relations between the midlevel feature points on the manifold. We have tested our approach on the KTH action dataset, our own YouTube action dataset and the fifteen scene dataset, and have obtained very promising results.

Cite

Text

Liu et al. "Learning Semantic Visual Vocabularies Using Diffusion Distance." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009. doi:10.1109/CVPR.2009.5206845

Markdown

[Liu et al. "Learning Semantic Visual Vocabularies Using Diffusion Distance." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2009.](https://mlanthology.org/cvpr/2009/liu2009cvpr-learning-a/) doi:10.1109/CVPR.2009.5206845

BibTeX

@inproceedings{liu2009cvpr-learning-a,
  title     = {{Learning Semantic Visual Vocabularies Using Diffusion Distance}},
  author    = {Liu, Jingen and Yang, Yang and Shah, Mubarak},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2009},
  pages     = {461-468},
  doi       = {10.1109/CVPR.2009.5206845},
  url       = {https://mlanthology.org/cvpr/2009/liu2009cvpr-learning-a/}
}