Video Scene Understanding Using Multi-Scale Analysis

Abstract

We propose a novel method for automatically discovering key motion patterns happening in a scene by observing the scene for an extended period. Our method does not rely on object detection and tracking, and uses low level features, the direction of pixel wise optical flow. We first divide the video into clips and estimate a sequence of flow-fields. Each moving pixel is quantized based on its location and motion direction. This is essentially a bag of words representation of clips. Once a bag of words representation is obtained, we proceed to the screening stage, using a measure called the `conditional entropy'. After obtaining useful words we apply Diffusion maps. Diffusion maps framework embeds the manifold points into a lower dimensional space while preserving the intrinsic local geometric structure. Finally, these useful words in lower dimensional space are clustered to discover key motion patterns. Diffusion map embedding involves diffusion time parameter which gives us ability to detect key motion patterns at different scales using multi-scale analysis. In addition, clips which are represented in terms of frequency of motion patterns can also be clustered to determine multiple dominant motion patterns which occur simultaneously, providing us further understanding of the scene. We have tested our approach on two challenging datasets and obtained interesting and promising results.

Cite

Text

Yang et al. "Video Scene Understanding Using Multi-Scale Analysis." IEEE/CVF International Conference on Computer Vision, 2009. doi:10.1109/ICCV.2009.5459376

Markdown

[Yang et al. "Video Scene Understanding Using Multi-Scale Analysis." IEEE/CVF International Conference on Computer Vision, 2009.](https://mlanthology.org/iccv/2009/yang2009iccv-video/) doi:10.1109/ICCV.2009.5459376

BibTeX

@inproceedings{yang2009iccv-video,
  title     = {{Video Scene Understanding Using Multi-Scale Analysis}},
  author    = {Yang, Yang and Liu, Jingen and Shah, Mubarak},
  booktitle = {IEEE/CVF International Conference on Computer Vision},
  year      = {2009},
  pages     = {1669-1676},
  doi       = {10.1109/ICCV.2009.5459376},
  url       = {https://mlanthology.org/iccv/2009/yang2009iccv-video/}
}