Modeling the Temporal Extent of Actions

Abstract

In this paper, we present a framework for estimating what portions of videos are most discriminative for the task of action recognition. We explore the impact of the temporal cropping of training videos on the overall accuracy of an action recognition system, and we formalize what makes a set of croppings optimal. In addition, we present an algorithm to determine the best set of croppings for a dataset, and experimentally show that our approach increases the accuracy of various state-of-the-art action recognition techniques.

Cite

Text

Satkin and Hebert. "Modeling the Temporal Extent of Actions." European Conference on Computer Vision, 2010. doi:10.1007/978-3-642-15549-9_39

Markdown

[Satkin and Hebert. "Modeling the Temporal Extent of Actions." European Conference on Computer Vision, 2010.](https://mlanthology.org/eccv/2010/satkin2010eccv-modeling/) doi:10.1007/978-3-642-15549-9_39

BibTeX

@inproceedings{satkin2010eccv-modeling,
  title     = {{Modeling the Temporal Extent of Actions}},
  author    = {Satkin, Scott and Hebert, Martial},
  booktitle = {European Conference on Computer Vision},
  year      = {2010},
  pages     = {536-548},
  doi       = {10.1007/978-3-642-15549-9_39},
  url       = {https://mlanthology.org/eccv/2010/satkin2010eccv-modeling/}
}