Visual Motif Discovery via First-Person Vision
Abstract
Visual motifs are images of visual experiences that are significant and shared across many people, such as an image of an informative sign viewed by many people and that of a familiar social situation such as when interacting with a clerk at a store. The goal of this study is to discover visual motifs from a collection of first-person videos recorded by a wearable camera. To achieve this goal, we develop a commonality clustering method that leverages three important aspects: inter-video similarity, intra-video sparseness, and people’s visual attention. The problem is posed as normalized spectral clustering, and is solved efficiently using a weighted covariance matrix. Experimental results suggest the effectiveness of our method over several state-of-the-art methods in terms of both accuracy and efficiency of visual motif discovery.
Cite
Text
Yonetani et al. "Visual Motif Discovery via First-Person Vision." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46475-6_12Markdown
[Yonetani et al. "Visual Motif Discovery via First-Person Vision." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/yonetani2016eccv-visual/) doi:10.1007/978-3-319-46475-6_12BibTeX
@inproceedings{yonetani2016eccv-visual,
title = {{Visual Motif Discovery via First-Person Vision}},
author = {Yonetani, Ryo and Kitani, Kris Makoto and Sato, Yoichi},
booktitle = {European Conference on Computer Vision},
year = {2016},
pages = {187-203},
doi = {10.1007/978-3-319-46475-6_12},
url = {https://mlanthology.org/eccv/2016/yonetani2016eccv-visual/}
}