Visual Motif Discovery via First-Person Vision

Abstract

Visual motifs are images of visual experiences that are significant and shared across many people, such as an image of an informative sign viewed by many people and that of a familiar social situation such as when interacting with a clerk at a store. The goal of this study is to discover visual motifs from a collection of first-person videos recorded by a wearable camera. To achieve this goal, we develop a commonality clustering method that leverages three important aspects: inter-video similarity, intra-video sparseness, and people’s visual attention. The problem is posed as normalized spectral clustering, and is solved efficiently using a weighted covariance matrix. Experimental results suggest the effectiveness of our method over several state-of-the-art methods in terms of both accuracy and efficiency of visual motif discovery.

Cite

Text

Yonetani et al. "Visual Motif Discovery via First-Person Vision." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46475-6_12

Markdown

[Yonetani et al. "Visual Motif Discovery via First-Person Vision." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/yonetani2016eccv-visual/) doi:10.1007/978-3-319-46475-6_12

BibTeX

@inproceedings{yonetani2016eccv-visual,
  title     = {{Visual Motif Discovery via First-Person Vision}},
  author    = {Yonetani, Ryo and Kitani, Kris Makoto and Sato, Yoichi},
  booktitle = {European Conference on Computer Vision},
  year      = {2016},
  pages     = {187-203},
  doi       = {10.1007/978-3-319-46475-6_12},
  url       = {https://mlanthology.org/eccv/2016/yonetani2016eccv-visual/}
}