Representing Videos Using Mid-Level Discriminative Patches

Abstract

representation for videos based on mid-level discriminative spatio-temporal patches. These spatio-temporal patches might correspond to a primitive human action, a semantic object, or perhaps a random but informative spatiotemporal patch in the video. What defines these spatiotemporal patches is their discriminative and representative properties. We automatically mine these patches from hundreds of training videos and experimentally demonstrate that these patches establish correspondence across videos and align the videos for label transfer techniques. Furthermore, these patches can be used as a discriminative vocabulary for action classification where they demonstrate stateof-the-art performance on UCF50 and Olympics datasets.

Cite

Text

Jain et al. "Representing Videos Using Mid-Level Discriminative Patches." Conference on Computer Vision and Pattern Recognition, 2013. doi:10.1109/CVPR.2013.332

Markdown

[Jain et al. "Representing Videos Using Mid-Level Discriminative Patches." Conference on Computer Vision and Pattern Recognition, 2013.](https://mlanthology.org/cvpr/2013/jain2013cvpr-representing/) doi:10.1109/CVPR.2013.332

BibTeX

@inproceedings{jain2013cvpr-representing,
  title     = {{Representing Videos Using Mid-Level Discriminative Patches}},
  author    = {Jain, Arpit and Gupta, Abhinav and Rodriguez, Mikel and Davis, Larry S.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2013},
  doi       = {10.1109/CVPR.2013.332},
  url       = {https://mlanthology.org/cvpr/2013/jain2013cvpr-representing/}
}